October 1, 2017 - 7 minutes
Two examples using OpenData portal's API for shape files and spreadsheetsCville Open Data github pages https custom domains
API stands for “Application Program Interface”, which is just fancy speak for the framwork allowing data access via code. APIs are designed to make working with data easier for coders, and most major companies offer one. Twitter, Google, and Dropbox all have their own API, and the Charlottesville Open Data Portal (ODP) does too. APIs can vary widely in the interactions they perfrom, but since the ODP is designed as a data source, its API only has download functionality.
There are a couple of good reasons to use an API:
- No more local file paths
- Anyone with API access can pick up the same data
- Don’t have to download lastet versions by hand
- Consistant interface with web links (RestAPI)
To acccess the API, all we need to do is copy the link found in the API menu in the right side bar.
There are some other nice perks in the preview, including an intergrated console and the modern dark theme! You can download the preview here.
The data that we are after is in the second element
.$features, but we can see from our
View() look-in that it is formatted slightly weird. Let’s look at the top rows:
crime_df <- crime_json[["features"]] head(crime_df)
## type properties.RecordID properties.Offense ## 1 Feature 2001 Burglary ## 2 Feature 2002 Found/Recovered Property ## 3 Feature 2003 Towed Vehicle ## 4 Feature 2004 Vandalism ## 5 Feature 2005 Lost/FoundProperty ## 6 Feature 2006 Vandalism ## properties.IncidentID properties.BlockNumber properties.StreetName ## 1 201300006692 600 LOCUST AVE ## 2 201300006693 1600 BRANDYWINE DR ## 3 201300006694 1900 JEFFERSON PARK AVE ## 4 201300006695 1400 CHESAPEAKE ST ## 5 201300006696 900 PRESTON AVE ## 6 201300006697 200 5TH ST SW ## properties.Agency properties.DateReported properties.HourReported ## 1 CPD 2013/11/11 09:16:00+00 0916 ## 2 CPD 2013/11/11 08:39:00+00 0839 ## 3 CPD 2013/11/11 09:35:00+00 0935 ## 4 CPD 2013/11/11 10:45:00+00 1045 ## 5 CPD 2013/11/11 12:11:00+00 1211 ## 6 CPD 2013/11/11 11:25:00+00 1125 ## geometry ## 1 NA ## 2 NA ## 3 NA ## 4 NA ## 5 NA ## 6 NA
If you are really paying attention you might realize that the dimensions of
crime_df are 32000x3, and the second column
properties is a bunch of 8X1 dataframes.
head() silently “unnested”
properties and printed it as if it were a real 32000x11 dataframe. The last column
geometry is all
NAs for this data set because it is just a table of text information. In the next section we will see this change when accessing the API for shape files.
All that’s left to do is to extract just the
properties column and we will have a nice dataframe ready to start analyzing!
crime_df <- crime_df[]
With only a couple of lines of code, we have accessed all of the Crime Data released by the Charlottesville Police Department to date on the ODP. This is exactly the same data we would have if we downloaded the ‘csv’ file by hand and read it in using
readr::read_csv(), but now we did it all with reproducible code!
If you want to keep going with this data and start cleaning it up for your analysis check out the begining of my towing forcast.
Most of the datasets on the ODP are actually shape files and not information tables. The only difference between working with shape files compared to tables are some small tweaks to the arguments we pass in to
geojson_read(). Let’s access the files that represent the city’s police beats, or sub-regions. The data we grab here is available on the Police Neighborhood Area page, using the same API drop down menu as we used before.
police_areas_sp <- geojson_read("https://opendata.arcgis.com/datasets/ceaf5bd3321d4ae8a0c1a2b21991e6f8_9.geojson", what = "sp") # skips the default "list" object and goes straight to spacial
We can see that the object we are returned is a little more complex than earlier. It is of the class ‘SpatialPolygonsDataFrame’ which is from the
library(sp) that defines classes and methods for spatial data in R. Let’s see what that class looks like with
polygons elements are the key components of a
data is a list of information tables and
polygons is list of
sp::Polygons that contain the latitude and longitude cordinates necesary for mapping the region outlines. Indexing a
SpatialPolygonDataFrame uses a special
@ syntax to get to specific elements by name, so we would use
police_areas_sp@data to look at just that section.
Notice both elements are the same length, Charlottesville is broken down into 33 police beats, so there is table information to accompany each map region. This means that a single
SpatialPolygonDataFrame can be used to group the mapping information with accesory information, like population, crime rate, or real estate assesments.
## OBJECTID AGENCY BEAT_NO NAME POPULATION DISTRICT ## 0 1 CP 10 Belmont 4327 2 ## 1 2 CP 20 Woolen Mills 903 3 ## 2 3 CP 09 Little High 654 3 ## 3 4 CP 08 Martha Jefferson 1167 3 ## 4 5 CP 07 North Downtown 1634 3 ## 5 6 CP 28 Charlottesville High 998 5
We could use a variety of map tools to plot a this type of SpatialPolygonDataFrame, including
library(mapview). Here we use
library(leaflet) to plot the police beats and use the
R and is pretty great, even ESRI (the ODP’s commercial partner) build a package with it.
library(leaflet) #v1.1.0 pal <- colorNumeric("viridis", NULL) leaflet(police_areas_sp, options = leafletOptions(minZoom = 13, maxZoom = 15)) %>% addProviderTiles("OpenStreetMap.Mapnik") %>% # should look familiar addPolygons(color = "#a5a597", smoothFactor = 0.3, fillOpacity = .5, fillColor = ~pal(POPULATION), label = ~paste0(NAME, ": ", formatC(POPULATION, big.mark = ","))) %>% frameWidget(height = '400')
I’m not going to dive into the syntax of the leaflet code here, but the Leaflet for R website has great write ups for a variety of mapping options.
I hope you feel good about working with the ODP’s API now. It is fairly straight forward to do with
library(geojsonio), a big thank you to the rOpenSci for supporting this package and a bunch of others! If you have any questions or issues working with other dataset in the portal, feel free to send me an email. I’m always happy to help with
R questions and I’m really excited about using the ODP to make life in Charlottesville even better!