October 1, 2017 - 7 minutes
Two examples using OpenData portal's API for shape files and spreadsheetsCville Open Data github pages https custom domains
API stands for “Application Program Interface”, which is just fancy speak for the framwork allowing data access via code. APIs are designed to make working with data easier for coders, and most major companies offer one. Twitter, Google, and Dropbox all have their own API, and the Charlottesville Open Data Portal (ODP) does too. APIs can vary widely in the interactions they perfrom, but since the ODP is designed as a data source, its API only has download functionality.
There are a couple of good reasons to use an API:
- No more local file paths
- Anyone with API access can pick up the same data
- Don’t have to download lastet versions by hand
- Consistant interface with web links (RestAPI)
To acccess the API, all we need to do is copy the link found in the API menu in the right side bar.
And then let our our new best friend
library(geojsonio) do the heavy lifting. The
library(geojsonio) is designed to handle geographic data in the GeoJSON and TopoJSON formats. It contains the function
geojson_read() for accessing data via url or local file path. We will use it on the link stored in our clipboard.
library(geojsonio) #v_0.4.2 crime_json <- geojson_read("https://opendata.arcgis.com/datasets/d1877e350fad45d192d233d2b2600156_6.geojson", parse = TRUE) # ~1-5 minutes # parse = TRUE attempts to parse to data.frame like structures; default is FALSE
Our read-in returned a large list (4.3 Mb) with two elements,
$features. If you are using RStudio the newest development version (v1.1.379) offers enhanced views for lists and other objects:
View(crime_json) # we could also click on the object in the 'Environment' tab
There are some other nice perks in the preview, including an intergrated console and the modern dark theme! You can download the preview here.
The data that we are after is in the second element
.$features, but we can see from our
View() look-in that it is formatted slightly weird. Let’s look at the top rows:
crime_df <- crime_json[["features"]] head(crime_df)
## type properties.RecordID properties.Offense ## 1 Feature 1 Assist Citizen - Mental/TDO/ECO ## 2 Feature 2 Assault Intimidation ## 3 Feature 3 Suspicious Activity ## 4 Feature 4 Warrant Service ## 5 Feature 5 Assault Simple ## 6 Feature 6 Assault Simple ## properties.IncidentID properties.BlockNumber properties.StreetName ## 1 202000015586 1400 BURGESS LN ## 2 202000015564 800 FOREST ST ## 3 202000015538 700 BELMONT COTTAGE LN ## 4 202000015490 1900 EMMET ST N ## 5 202000015487 800 E MARKET ST ## 6 202000015486 2000 HOLIDAY DR ## properties.Agency properties.DateReported properties.HourReported ## 1 CPD 2020/06/07 20:44:54+00 2044 ## 2 CPD 2020/06/07 14:04:50+00 1404 ## 3 CPD 2020/06/07 08:06:23+00 0806 ## 4 CPD 2020/06/06 22:41:44+00 2241 ## 5 CPD 2020/06/06 21:15:36+00 2115 ## 6 CPD 2020/06/06 20:59:43+00 2059 ## geometry.type geometry.geometries ## 1 GeometryCollection NULL ## 2 GeometryCollection NULL ## 3 GeometryCollection NULL ## 4 GeometryCollection NULL ## 5 GeometryCollection NULL ## 6 GeometryCollection NULL
If you are really paying attention you might realize that the dimensions of
crime_df are 32000x3, and the second column
properties is a bunch of 8X1 dataframes.
head() silently “unnested”
properties and printed it as if it were a real 32000x11 dataframe. The last column
geometry is all
NAs for this data set because it is just a table of text information. In the next section we will see this change when accessing the API for shape files.
All that’s left to do is to extract just the
properties column and we will have a nice dataframe ready to start analyzing!
crime_df <- crime_df[]
With only a couple of lines of code, we have accessed all of the Crime Data released by the Charlottesville Police Department to date on the ODP. This is exactly the same data we would have if we downloaded the ‘csv’ file by hand and read it in using
readr::read_csv(), but now we did it all with reproducible code!
If you want to keep going with this data and start cleaning it up for your analysis check out the begining of my towing forcast.
Most of the datasets on the ODP are actually shape files and not information tables. The only difference between working with shape files compared to tables are some small tweaks to the arguments we pass in to
geojson_read(). Let’s access the files that represent the city’s police beats, or sub-regions. The data we grab here is available on the Police Neighborhood Area page, using the same API drop down menu as we used before.
police_areas_sp <- geojson_read("https://opendata.arcgis.com/datasets/ceaf5bd3321d4ae8a0c1a2b21991e6f8_9.geojson", what = "sp") # skips the default "list" object and goes straight to spacial
We can see that the object we are returned is a little more complex than earlier. It is of the class ‘SpatialPolygonsDataFrame’ which is from the
library(sp) that defines classes and methods for spatial data in R. Let’s see what that class looks like with
polygons elements are the key components of a
data is a list of information tables and
polygons is list of
sp::Polygons that contain the latitude and longitude cordinates necesary for mapping the region outlines. Indexing a
SpatialPolygonDataFrame uses a special
@ syntax to get to specific elements by name, so we would use
police_areas_sp@data to look at just that section.
Notice both elements are the same length, Charlottesville is broken down into 33 police beats, so there is table information to accompany each map region. This means that a single
SpatialPolygonDataFrame can be used to group the mapping information with accesory information, like population, crime rate, or real estate assesments.
## OBJECTID BEAT_NO NAME POPULATION DISTRICT ## 1 1 10 Belmont 4327 2 ## 2 2 20 Woolen Mills 903 3 ## 3 3 09 Little High 654 3 ## 4 4 08 Martha Jefferson 1167 3 ## 5 5 07 North Downtown 1634 3 ## 6 6 28 Charlottesville High 998 5
We could use a variety of map tools to plot a this type of SpatialPolygonDataFrame, including
library(mapview). Here we use
library(leaflet) to plot the police beats and use the
R and is pretty great, even ESRI (the ODP’s commercial partner) build a package with it.
library(leaflet) #v1.1.0 pal <- colorNumeric("viridis", NULL) leaflet(police_areas_sp, options = leafletOptions(minZoom = 13, maxZoom = 15)) %>% addProviderTiles("OpenStreetMap.Mapnik") %>% # should look familiar addPolygons(color = "#a5a597", smoothFactor = 0.3, fillOpacity = .5, fillColor = ~pal(POPULATION), label = ~paste0(NAME, ": ", formatC(POPULATION, big.mark = ","))) %>% frameWidget(height = '400')
I’m not going to dive into the syntax of the leaflet code here, but the Leaflet for R website has great write ups for a variety of mapping options.
I hope you feel good about working with the ODP’s API now. It is fairly straight forward to do with
library(geojsonio), a big thank you to the rOpenSci for supporting this package and a bunch of others! If you have any questions or issues working with other dataset in the portal, feel free to send me an email. I’m always happy to help with
R questions and I’m really excited about using the ODP to make life in Charlottesville even better!