Rats_in

Rats in NYC

Charles O'Bert and Cameron

https://charles-obert.github.io/Rats_in_NYC/Rats_in_NYC.html

Project Goals

The goal of this project is to understand the rat infestation issue in New York City. Several questions we want to answer:

What areas of New York City are most affected by the rats?
What factors cause certain areas of New York City to have a higher density in rats?
Are there any factors that are human-related?

So far, we’ve come up with only one data set that shows the geographic distribution of rats in New York City, but we are still on the lookout.

Collaboration Plan

Our plan is to meet biweekly over Zoom in order to break down the deliverables for each milestone. We will go through each piece of the assignment and work on some pieces collaboratively and delegate other pieces accordingly. We plan to use Google Colab for the coding bits, and then we will import it into Jupyter Notebooks in order to submit it.

Data Set Description:

https://data.cityofnewyork.us/Social-Services/Rat-Sightings/3q43-55fe/about_data

New York City Services gives documentation for each service call they received from 2010 to present including when the complaint was opened and closed, where it came from and other other descriptive variables.

In [ ]:

#ETL: get and turn into tidy data
import pandas as pd
!git clone https://github.com/Charles-OBert/Data-Science-Project.git
rats = pd.read_csv('Data-Science-Project/Rat_Sightings.csv') # read the 'csv' file
rats.head()

fatal: destination path 'Data-Science-Project' already exists and is not an empty directory.

<ipython-input-31-d65a12816c25>:4: DtypeWarning: Columns (20) have mixed types. Specify dtype option on import or set low_memory=False.
  rats = pd.read_csv('Data-Science-Project/Rat_Sightings.csv') # read the 'csv' file

Out[ ]:

	Unique Key	Created Date	Closed Date	Agency	Agency Name	Complaint Type	Descriptor	Location Type	Incident Zip	Incident Address	...	Vehicle Type	Taxi Company Borough	Taxi Pick Up Location	Bridge Highway Name	Bridge Highway Direction	Road Ramp	Bridge Highway Segment	Latitude	Longitude	Location
0	43601221	08/21/2019 03:19:40 PM	09/05/2019 12:13:56 PM	DOHMH	Department of Health and Mental Hygiene	Rodent	Rat Sighting	1-2 Family Dwelling	11379.0	66-08 73 PLACE	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.714803	-73.880961	(40.7148032048583, -73.88096061491734)
1	43514079	08/11/2019 12:44:47 AM	10/03/2019 04:21:04 PM	DOHMH	Department of Health and Mental Hygiene	Rodent	Rat Sighting	Commercial Building	11232.0	4201 THIRD AVENUE	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.652075	-74.010008	(40.65207491193432, -74.01000792293357)
2	43487445	08/07/2019 12:48:56 PM	08/29/2019 12:04:12 PM	DOHMH	Department of Health and Mental Hygiene	Rodent	Rat Sighting	Commercial Building	11214.0	1846 BATH AVENUE	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.603169	-74.004826	(40.603168585850305, -74.00482563675014)
3	43527756	08/12/2019 03:12:52 AM	08/29/2019 12:08:43 PM	DOHMH	Department of Health and Mental Hygiene	Rodent	Rat Sighting	Hospital	10037.0	37 WEST 137 STREET	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.814592	-73.937963	(40.81459196358481, -73.9379633682187)
4	43560214	08/16/2019 04:48:29 PM	09/05/2019 12:13:57 PM	DOHMH	Department of Health and Mental Hygiene	Rodent	Rat Sighting	1-2 Family Dwelling	11379.0	61-33 82 STREET	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.725040	-73.876974	(40.72503971978735, -73.8769737384869)

5 rows × 38 columns

Let's get rid of columns we don't need or are not functional.

In [ ]:

rats = rats.drop(rats.nunique()[rats.nunique() <= 1].index, axis=1) #Gets rid of columns with nothing or only one value
rats = rats.drop(["Due Date",
                  "Address Type",
                  "Park Borough",
                  "Location",
                  "Community Board",
                  "Created Date",
                  "Closed Date",
                  "Resolution Action Updated Date",
                  "X Coordinate (State Plane)",
                  "Y Coordinate (State Plane)",
                  "Unique Key",
                  "Intersection Street 1",
                  "Intersection Street 2",
                  "Incident Zip",
                  "Status"], axis =1)
rats=rats.dropna()
rats.head()

Out[ ]:

	Location Type	Incident Address	Street Name	Cross Street 1	Cross Street 2	City	Landmark	Borough	Latitude	Longitude
0	1-2 Family Dwelling	66-08 73 PLACE	73 PLACE	JUNIPER VALLEY ROAD	66 ROAD	MIDDLE VILLAGE	73 PLACE	QUEENS	40.714803	-73.880961
1	Commercial Building	4201 THIRD AVENUE	THIRD AVENUE	42 STREET	43 STREET	BROOKLYN	3 AVENUE	BROOKLYN	40.652075	-74.010008
2	Commercial Building	1846 BATH AVENUE	BATH AVENUE	BAY 19 STREET	BAY 20 STREET	BROOKLYN	BATH AVENUE	BROOKLYN	40.603169	-74.004826
3	Hospital	37 WEST 137 STREET	WEST 137 STREET	5 AVENUE	LENOX AVENUE	NEW YORK	WEST 137 STREET	MANHATTAN	40.814592	-73.937963
4	1-2 Family Dwelling	61-33 82 STREET	82 STREET	CALDWELL AVENUE	62 AVENUE	MIDDLE VILLAGE	82 STREET	QUEENS	40.725040	-73.876974

Now we check for their data types. All of the data types match what was expected, so we are all set in that regard.

In [ ]:

rats.dtypes

Out[ ]:

Location Type        object
Incident Address     object
Street Name          object
Cross Street 1       object
Cross Street 2       object
City                 object
Landmark             object
Borough              object
Latitude            float64
Longitude           float64
dtype: object

Now that our data frame is tidy, let's figure out which area has the highest amount of rat sightings.

In [ ]:

borough_counts = rats.Borough.value_counts()
borough_counts

Out[ ]:

BROOKLYN         38113
MANHATTAN        25998
BRONX            16274
QUEENS           15359
STATEN ISLAND     3010
Name: Borough, dtype: int64

In [ ]:

#Visualization
borough_counts = rats.Borough.value_counts()
borough_counts.plot.bar()

Out[ ]:

<Axes: >