Introduction

The Department of Health and Mental Hygiene in New York City is responsible for taking reports and following up on rat sightings. Each sighting is recorded and updated when the matter is considered to be resolved. Each record includes many pieces of information including when it was created, closed, the type of location, the zip-code, address, borough, and latitude and longitude. In this report, we will seek insight to the following research questions:

The Data

The main data set utilized provides descriptions of rat sightings reported to the Department Of Health and Mental Hygiene in NYC from January 2010 to the present. The data is available freely to the public from the NYC OpenData website and is updated daily. This report is formatted to update the data used approximately every month after first being compiled.

The data can be accessed for viewing and is available for download from the following link:

https://nycopendata.socrata.com/Social-Services/Rat-Sightings/3q43-55fe

The NYC OpenData website is part of the "Open Data Law" enacted in 2012 which mandates data from public entities to be available online. The website currently has over 1900 data sets available.

Variables of interest in the rat sighting data set include:

  • date reported
  • date completed
  • address
  • borough
  • zip-code
  • latitude and longitude

Secondly, a data set with monthly time series of home values by zip code will be utilized and joined with the rat data set. We will attempt to regress the average home value on counts of rat sightings and boroughs. The home value data is accessed directly from Zillow's website at the following link:

https://www.zillow.com/research/data/

The specific data set used was for Home Listings and Sales with Data Type = Median List Price and Geography = ZIP Code.

Variables of interest in the Zillow home value data set include:

  • region name (zip code)
  • state
  • metro
  • county
  • monthly median values by region name
## Warning: 2 parsing failures.
##    row          col expected actual                 file
##  71934 Incident Zip a double    N/A 'rats2021-03-09.csv'
## 132279 Incident Zip a double    N/A 'rats2021-03-09.csv'
## `summarise()` has grouped output by 'year.created'. You can override using the `.groups` argument.

Data cleaning

Since this is real data, it needed to be examined and manipulated for our purposes. For time series plots, counts were grouped by year and month of creation and the official date assigned to each time point was the minimum of the dates in that grouping, which should be the first day of the month.

Initially, an issue in the data was found in a preliminary time series plot. It appeared that the count per month of rat sightings was not consistently being entered before July of 2015. During the process of creating this report (as of May 2nd 2018) the issue in the data appears to have been resolved by NYC OpenData authorities and all entered records of rat sightings from NYC OpenData are used for this report. As of May 2nd 2018, there were over 110,000 records of rat sightings.

For the Zillow data set, zip codes had been truncated to leave out leading zeros. Using an sapply function with an ifelse statement, the leading zeros were pasted back. The data set was then converted to tidy format using the gather function.

Next, the data set was filtered for zip codes matching the rat data set (NYC zip codes). Lastly, grouping by date and zip code home values were averaged within groups.

Report frequency distributions

When looking at rat sightings, it is interesting to see if there is any pattern in what day of the week that reports get filed and when the reports are closed.

Looking at the day of the week that the reports are created, the most noticeable feature is that during the weekend, Saturday and Sunday, there are only about half to two-thirds the amount of reports as there are during the majority of weekdays. Also quite noticeable is the drop from Wednesday to Friday. This suggests that reports may typically "slow down" as the weekend approaches.

When looking at the day of the week when reports are considered closed, it is first most noticeable that nearly all reports are closed between Monday and Friday. However, there are still a small number of cases closed during the weekend. There are also a sizable amount of "NA" values which represents cases that were not closed or are still in progress. There is not much of a particularly "busy" day during the week but it does appear that Monday is the least busy.

In particular, the features shown in the month-to-month count bar charts should be looked at to examine how reports vary over time.

Some months of the year are likely to be considered "busy" months. To examine which months are the busiest overall, it is meaningful to examine frequency counts of when rat sighting reports are created and closed by what month of the year is recorded. Reports that were not closed are not included.

When looking at the bar chart of counts of rat sighting reports created, it is clear that July is the busiest month for reports to be created. In general, the summer months of June through September are the busiest. While winter months of December, January, and February had much lower counts of reports being created. Additionally, there is a noticeable drop between October and November.

The bar chart of closed reports shows very similar patterns as summer months have more reports closed with winter having the fewest. There is also a drop off from October to November.

Among the five boroughs of NYC, Brooklyn has by far the most reports of rat sightings, followed by Manhattan, the Bronx, Queens and lastly Staten Island had the fewest. Brooklyn appears to have about 7x as many rat sightings as Staten Island.

Rat Sightings over Time

The rat sighting data is collected across time, so questions that arise are if rat sightings are increasing, and if there is a seasonal trend. Additionally, These same questions can be examined on each of the boroughs in NYC. To examine these questions, it is appropriate to examine time series plots of rat sightings.

Looking at rat sightings across time, it does appear that rat sightings are generally increasing across time. There also appears to be a cyclical component as in general, the trend tends to be decreasing from August to January and increasing from January to July. The highest point appears in July of 2017 and is a large increase from the previous July.

## `summarise()` has grouped output by 'year.created', 'month.created'. You can override using the `.groups` argument.

The cyclical pattern of increasing towards summer months and decreasing towards winter appears to hold for all five boroughs. Except for Staten Island, the boroughs seem to have rat sightings increasing over time. Generally consistent with the bar plot of counts by borough, Brooklyn usually has the most rat sightings, followed by Manhattan, the Bronx, Queens, and Staten Island. The large peak observed in July of 2017 seems to be mostly attributable to a large peak in Brooklyn, while the other four boroughs are pretty similar to their July 2016 counts.

## `summarise()` has grouped output by 'date', 'zipcode'. You can override using the `.groups` argument.

Home value vs. Rat Sightings

New York City is known for its high real estate values. Is there a relationship between average home prices and rat sightings? To examine this question, rat sightings can be tied to an average home value data from Zillow by using a primary key of date and zip-code. An inner join was performed to combine the average home values data with the rat sightings data.

Scatter plots of home value versus rat sightings conditioned on borough show the highest counts of rat sightings occurring in Brooklyn with Manhattan a close second. Staten Island and the Bronx have minimal rat sightings.

Manhattan shows two distinct groups of average home values. Interesting to note the very high end real estate still suffers from a presence of rats. Despite its relatively high average home values in comparison to Queens, the Bronx, and Staten Island, Brooklyn has the most serious rat problem. Notably, the problem is not exclusive to low values homes as the highest counts of rat sightings occur at average home values in excess of 1 million dollars.

To formally examine a relationship between home value and rat sightings, an OLS regression was performed by regressing the log of average home values vs. counts and boroughs and the interaction of counts and boroughs.

##                            Estimate Std. Error   t value Pr(>|t|)
## (Intercept)                12.95584    0.04571 283.43427  0.00000
## count                      -0.03250    0.00447  -7.27080  0.00000
## boroughBROOKLYN             0.49691    0.05056   9.82804  0.00000
## boroughMANHATTAN            1.38311    0.05177  26.71621  0.00000
## boroughQUEENS               0.18321    0.05003   3.66157  0.00026
## boroughSTATEN ISLAND        0.18265    0.05826   3.13502  0.00174
## count:boroughBROOKLYN       0.03762    0.00459   8.18979  0.00000
## count:boroughMANHATTAN      0.01732    0.00485   3.57259  0.00036
## count:boroughQUEENS         0.03608    0.00543   6.64915  0.00000
## count:boroughSTATEN ISLAND  0.02933    0.00675   4.34465  0.00001

Model assumptions were reasonably met after log transformation of home value and all coefficients are significant. Most importantly, the count coefficient is negative and significant with respect to average home value. This suggests that the presence of more rats is associated with a lower home value, in general, which is concordant with was was seen in the scatter plot matrix.

Rat Sightings by location

Counts from all months in the data set were aggregated by borough. The data set has over 100K observations.

Using a dot plot with a map of the greater NYC area, all areas are completely covered making it difficult to perform any type of analysis. Alpha blending makes it easier to discern the most rat infested areas. However, a density plot with alpha blending gives us a better indication of the problematic areas.

The areas with the highest number of reported rat sightings (yellow) centers on Brooklyn and to a lesser extent, Upper Manhattan. This is in agreement with our scatter plots, time series plots, and bar plots which indicate that Brooklyn has the highest concentration of rat sightings with Manhattan coming in second. By mapping the density of rat sightings, it can be seen that within Brooklyn there is a central hot spot for rat sightings near the Crown Heights neighborhood.

Conclusion

Examining rat sightings in New York City provided many insights. First it appeared that rat sightings tend to be higher in the summer months and lower in the winter months, which was evidenced by both a bar-plot and a time-series plot. Next, it was shown that Brooklyn has the most rat sightings and consistently has had the most each month since August 2015. The density of rat sightings was shown to be highest in the center of Brooklyn in a map. Lastly, it was shown that more rat sightings tends to be associated with areas with lower valued homes.

NYC OpenData provides free and open access to over 1900 data sets. The rat sighting data utilized in this report is just the tip of the iceberg for insights that can be drawn from the many available data sets.