Extending R's GTFS abilities with simple features

June 2, 2018 - 9 minutes
Two functions to easily transition from library(gtfsr) into library(sf) for tidyier transit analysis.
Civic Data sf gtfsr tidyverse

TLDR

Two similar functions for building sf objects of the routes and stops within a gtfs objects.

General Transit Feed Specification

Originally named Google Transit Feed Specification, GTFS was developed in 2005 by Chris Harrelson (a Googler at the time), to make trip planning on civic transit systems easier.

GTFS format is a standardized set of comma separated txt files, with specific names and fields. To be flexible (i.e. cover a lot of public transit options), not all files/fields are required. Here is a comprehensive list breakdown.

library(gtfsr)

The rOpenSci project’s library(gtfsr) is designed to make reading and visualizing GTFS data fast and easy. You can install the package from GitHub, via devtools::install_github("ropensci/gtfsr").

gtfsr::import_gtfs()

This function does a great job handling the flexible GTFS structure. Here I’m reading in the local Charlottesville Area Transit’s (CAT) GTFS feed.

The zipped data is available in this Civic Innovation Day repo.

library(gtfsr)
## Error in library(gtfsr): there is no package called 'gtfsr'
cat <- gtfsr::import_gtfs("https://github.com/Smart-Cville/CID-2018-Regional-Transit-Challenge/blob/b6c714ec190f8843d6aa154fc74ed7be3bd5307f/data/2017_08_CharlottesvilleAreaTransit.zip?raw=true")
## Error in loadNamespace(name): there is no package called 'gtfsr'
# lots of verbose warnings that I turned off for kniting porpoises

The function returns to us a special S3 class list object, gtfs, that has a data frame, for each of the available files.

class(cat)
## [1] "function"
names(cat)
## NULL

gtfsr::map_gtfs()

A high-level mapping function built with library(leaflet).

In this specific case, attempting to plot all routes results in an error about missing the agency_id field. So my work around here is to explicitly specify all routes, which is easy via %>% dot.

library(magrittr)

cat %>% map_gtfs() # errors out
## Error in map_gtfs(.): could not find function "map_gtfs"
# work around
cat %>% map_gtfs(., route_ids = .$routes_df$route_id,
                 route_colors = paste0("#", .$routes_df$route_color)) %>%
    # needed for web sharing only
    widgetframe::frameWidget() # isolates leaflet:css, to prevent messing with webpage:css
## Error in map_gtfs(., route_ids = .$routes_df$route_id, route_colors = paste0("#", : could not find function "map_gtfs"

You can compare the library(gtfsr) results against the current CAT app and static map.

Extending to sf

The gtfsr package does a great job reading and showing data, but if you want to join in additional data sources or customize plots beyond colors or routes, you are out of luck.

Simple Features is making big waves in the r-spatial community since library(sf)(https://github.com/r-spatial/sf) introduced a new class of geometry enriched data frames.

Extracting routes into routes_sf

Because GTFS data is distributed across multiple files, we must pick apart the list of data frames to pull out the critical pieces.

In order to reconstruct a plot similar to map_gtfs(), we need:

  1. key-value pairs from cat$trips_df
  2. route meta information from cat$routes_df
  3. shapes as Lon/lat coordinates from cat$shapes_df
  • Caution: the following function is long *
library(sf)
library(tidyverse)

gtfs_routes_sf <- function(gtfs) {
    
    ## gather key-values first ----
    
    # trips_df has route_id:shape_id
    shape_key <- gtfs$trips_df %>%
        select(route_id, shape_id) %>%
        unique()

    # routes_df has route_id:route_name
    route_key <- gtfs$routes_df %>%
        select(route_id, route_short_name) %>%
        mutate(route_short_name = paste("route", route_short_name)) %>%
        inner_join(shape_key)
    
    # check for colors :)
    if ( !is.null(gtfs$routes_df$route_color) ) { # extract if they exist
        route_key %<>% inner_join(select(gtfs$routes_df, route_color, route_id) )
    }
    else { # planB: build a pal from my favorite pallette 'd3'
        route_key %<>% mutate(route_color = rep(ggsci::pal_d3()(10),
                                                length.out = nrow(route_key)))
    }
    
    ## build the sf object ----
    
    # exctract lon/lat values as matrix to build linestrings for each "shape_id"
    sfc <- gtfs$shapes_df %>% # long data frame
        split(.$shape_id) %>% # list of shorted data framee, one per shape
        map(~ select(., shape_pt_lon, shape_pt_lat) %>% # order maters, lon-1st lat-2nd
            as.matrix %>% # coherce for st_linestrings happiness
            st_linestring) %>%
        st_sfc(crs = 4326) # bundle all shapes into a collection

    # add collection on and convert to sf
    unique(gtfs$shapes_df$shape_id) %>%
        sort() %>% # sort to match with names(sfc); split()'s factor-cohercion alpha sorts
        st_sf(shape_id = ., geometry = sfc) %>%
        inner_join(route_key)
    # plots fine ^^
    
    # st_sf(route_key, geometry = sfc)
    # doesn't plot ^^ and I can't explain it
}

OK that a lot of code, but does it actually work?

cat_routes_sf <- gtfs_routes_sf(cat)
## Error in gtfs$trips_df: object of type 'closure' is not subsettable
head(cat_routes_sf)
## Error in head(cat_routes_sf): object 'cat_routes_sf' not found

Yea it does.

Now all we need to do is viz it and since library(leaflet) already works out of the box with sf objects, it’s cake.

library(leaflet)

leaflet(cat_routes_sf) %>%
    addTiles() %>%
    addPolylines(color = ~paste0("#", route_color),
                 label = ~as.character(route_short_name)) %>%
    addLegend(colors = ~unique(paste0("#", route_color)),
              labels = ~unique(route_short_name)) %>%
    widgetframe::frameWidget()
## Error in structure(list(options = options), leafletData = data): object 'cat_routes_sf' not found

Rinse and repeat for stops_sf()

The steps are slightly different but the pattern is the same (and it’s still long).

gtfs_stops_sf <- function(gtfs) {
    shape_key <- gtfs$trips_df %>%
        select(trip_id, route_id, shape_id) %>%
        unique()
    
    # stop_times_df also has stop sequence and arrive/depart time for specific stops
    stop_key <- gtfs$stop_times_df %>%
        select(trip_id, stop_id) %>%
        unique() %>%
        inner_join(shape_key) %>% # one stop is on multiple routes
        # need to pair down
        arrange(route_id) %>% # use route_id as tiebreaker (for now)
        group_by(stop_id) %>% # group_by() to stop_id 
        slice(1) # to slice() out the first row
    
    if ( !is.null(gtfs$routes_df$route_color) ) {
        stop_key %<>% inner_join(select(gtfs$routes_df, route_color, route_id)) }
    else {stop_key %<>% mutate(route_color = rep(ggsci::pal_d3()(10), length.out = nrow(route_key))) }
    
    stops_sfc <- gtfs$stops_df %>%
        split(.$stop_id) %>%
        map(~select(., stop_lon, stop_lat) %>%
                unlist() %>%
                st_point() ) %>% # point instead of linestring
        st_sfc()
    
    st_sf(stop_key, geometry = stops_sfc) %>%
        inner_join(gtfs$stops_df)
}

Now, make use and make stops_sf object.

cat_stops_sf <- gtfs_stops_sf(cat)
## Error in gtfs$trips_df: object of type 'closure' is not subsettable
head(cat_stops_sf)
## Error in head(cat_stops_sf): object 'cat_stops_sf' not found

Now we just have to combine our two layers, routes and stops.

To make a map that looks a lot like to the result from map_gtfs(), I’m gonna tweak a few of the defaults to addCircleMarkers().

leaflet(cat_routes_sf) %>%
    addTiles() %>%
    addPolylines(color = ~paste0("#", route_color),
                 label = ~htmltools::htmlEscape(route_short_name)) %>%
    addCircleMarkers(data = cat_stops_sf, fillColor = ~paste0("#", route_color),
                     label = ~htmltools::htmlEscape(stop_name),
                     color = "black", radius = 5, weight = 3) %>%
    addLegend(colors = ~unique(paste0("#", route_color)),
              labels = ~unique(route_short_name)) %>%
    widgetframe::frameWidget()
## Error in structure(list(options = options), leafletData = data): object 'cat_routes_sf' not found

👌

Wrap up

Now we can build sf objects from gtfs objects at-will!

That gives us new plotting flexibility and makes incorporating new data more straight forward.

Looks like exploring transit patterns in R just got a little easier! Now if only all transit data was already cleaned into GTFS …

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.2 (2019-12-12)
##  os       macOS Catalina 10.15.5      
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2020-06-10                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  ! package     * version date       lib source        
##  P assertthat    0.2.1   2019-03-21 [?] CRAN (R 3.6.0)
##  P backports     1.1.5   2019-10-02 [?] CRAN (R 3.6.0)
##  P blogdown      0.17    2019-11-13 [?] CRAN (R 3.6.0)
##  P bookdown      0.15    2019-11-12 [?] CRAN (R 3.6.0)
##  P broom         0.5.2   2019-04-07 [?] CRAN (R 3.6.0)
##  P callr         3.3.2   2019-09-22 [?] CRAN (R 3.6.0)
##  P cellranger    1.1.0   2016-07-27 [?] CRAN (R 3.6.0)
##  P class         7.3-15  2019-01-01 [?] CRAN (R 3.6.1)
##  P classInt      0.4-2   2019-10-17 [?] CRAN (R 3.6.0)
##    cli           2.0.2   2020-02-28 [1] CRAN (R 3.6.0)
##  P colorspace    1.4-1   2019-03-18 [?] CRAN (R 3.6.0)
##  P crayon        1.3.4   2017-09-16 [?] CRAN (R 3.6.0)
##  P crosstalk     1.0.0   2016-12-21 [?] CRAN (R 3.6.0)
##  P DBI           1.0.0   2018-05-02 [?] CRAN (R 3.6.0)
##  P desc          1.2.0   2018-05-01 [?] CRAN (R 3.6.0)
##  P devtools      2.2.1   2019-09-24 [?] CRAN (R 3.6.0)
##    digest        0.6.25  2020-02-23 [1] CRAN (R 3.6.0)
##    dplyr       * 1.0.0   2020-05-29 [1] CRAN (R 3.6.2)
##  P e1071         1.7-2   2019-06-05 [?] CRAN (R 3.6.0)
##    ellipsis      0.3.1   2020-05-15 [1] CRAN (R 3.6.2)
##  P evaluate      0.14    2019-05-28 [?] CRAN (R 3.6.0)
##    fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.0)
##  P fastmap       1.0.1   2019-10-08 [?] CRAN (R 3.6.0)
##  P forcats     * 0.4.0   2019-02-17 [?] CRAN (R 3.6.0)
##  P fs            1.3.1   2019-05-06 [?] CRAN (R 3.6.0)
##  P generics      0.0.2   2018-11-29 [?] CRAN (R 3.6.0)
##  P ggplot2     * 3.3.1   2020-05-28 [?] CRAN (R 3.6.2)
##    glue          1.4.1   2020-05-13 [1] CRAN (R 3.6.2)
##  P gtable        0.3.0   2019-03-25 [?] CRAN (R 3.6.0)
##  P haven         2.2.0   2019-11-08 [?] CRAN (R 3.6.0)
##  P hms           0.5.2   2019-10-30 [?] CRAN (R 3.6.0)
##  P htmltools     0.4.0   2019-10-04 [?] CRAN (R 3.6.0)
##  P htmlwidgets   1.5.1   2019-10-08 [?] CRAN (R 3.6.0)
##    httpuv        1.5.4   2020-06-06 [1] CRAN (R 3.6.2)
##  P httr          1.4.1   2019-08-05 [?] CRAN (R 3.6.0)
##    jsonlite      1.6.1   2020-02-02 [1] CRAN (R 3.6.0)
##  P KernSmooth    2.23-16 2019-10-15 [?] CRAN (R 3.6.0)
##  P knitr         1.26    2019-11-12 [?] CRAN (R 3.6.0)
##    later         1.1.0.1 2020-06-05 [1] CRAN (R 3.6.2)
##  P lattice       0.20-38 2018-11-04 [?] CRAN (R 3.6.1)
##  P leaflet     * 2.0.3   2019-11-16 [?] CRAN (R 3.6.0)
##    lifecycle     0.2.0   2020-03-06 [1] CRAN (R 3.6.0)
##  P lubridate     1.7.4   2018-04-11 [?] CRAN (R 3.6.0)
##  P magrittr    * 1.5     2014-11-22 [?] CRAN (R 3.6.0)
##  P memoise       1.1.0   2017-04-21 [?] CRAN (R 3.6.0)
##    mime          0.9     2020-02-04 [1] CRAN (R 3.6.0)
##  P modelr        0.1.5   2019-08-08 [?] CRAN (R 3.6.0)
##  P munsell       0.5.0   2018-06-12 [?] CRAN (R 3.6.0)
##  P nlme          3.1-142 2019-11-07 [?] CRAN (R 3.6.0)
##    pillar        1.4.4   2020-05-05 [1] CRAN (R 3.6.2)
##  P pkgbuild      1.0.6   2019-10-09 [?] CRAN (R 3.6.0)
##  P pkgconfig     2.0.3   2019-09-22 [?] CRAN (R 3.6.0)
##  P pkgload       1.0.2   2018-10-29 [?] CRAN (R 3.6.0)
##  P prettyunits   1.0.2   2015-07-13 [?] CRAN (R 3.6.0)
##  P processx      3.4.1   2019-07-18 [?] CRAN (R 3.6.0)
##  P promises      1.1.0   2019-10-04 [?] CRAN (R 3.6.0)
##  P ps            1.3.0   2018-12-21 [?] CRAN (R 3.6.0)
##    purrr       * 0.3.4   2020-04-17 [1] CRAN (R 3.6.2)
##  P R6            2.4.1   2019-11-12 [?] CRAN (R 3.6.0)
##  P Rcpp          1.0.3   2019-11-08 [?] CRAN (R 3.6.0)
##  P readr       * 1.3.1   2018-12-21 [?] CRAN (R 3.6.0)
##  P readxl        1.3.1   2019-03-13 [?] CRAN (R 3.6.0)
##  P remotes       2.1.0   2019-06-24 [?] CRAN (R 3.6.0)
##    renv          0.8.3   2019-11-11 [1] CRAN (R 3.6.1)
##  P rlang         0.4.6   2020-05-02 [?] CRAN (R 3.6.2)
##  P rmarkdown     1.17    2019-11-13 [?] CRAN (R 3.6.0)
##  P rprojroot     1.3-2   2018-01-03 [?] CRAN (R 3.6.0)
##  P rstudioapi    0.10    2019-03-19 [?] CRAN (R 3.6.0)
##  P rvest         0.3.5   2019-11-08 [?] CRAN (R 3.6.0)
##    scales        1.1.1   2020-05-11 [1] CRAN (R 3.6.2)
##  P sessioninfo   1.1.1   2018-11-05 [?] CRAN (R 3.6.0)
##  P sf          * 0.9-3   2020-05-04 [?] CRAN (R 3.6.2)
##    shiny         1.4.0.2 2020-03-13 [1] CRAN (R 3.6.0)
##  P stringi       1.4.3   2019-03-12 [?] CRAN (R 3.6.0)
##  P stringr     * 1.4.0   2019-02-10 [?] CRAN (R 3.6.0)
##  P testthat      2.3.0   2019-11-05 [?] CRAN (R 3.6.0)
##    tibble      * 3.0.1   2020-04-20 [1] CRAN (R 3.6.2)
##  P tidyr       * 1.0.0   2019-09-11 [?] CRAN (R 3.6.0)
##    tidyselect    1.1.0   2020-05-11 [1] CRAN (R 3.6.2)
##  P tidyverse   * 1.2.1   2017-11-14 [?] CRAN (R 3.6.0)
##  P units         0.6-5   2019-10-08 [?] CRAN (R 3.6.0)
##  P usethis       1.5.1   2019-07-04 [?] CRAN (R 3.6.0)
##  P vctrs         0.3.1   2020-06-05 [?] CRAN (R 3.6.2)
##  P withr         2.1.2   2018-03-15 [?] CRAN (R 3.6.0)
##  P xfun          0.11    2019-11-12 [?] CRAN (R 3.6.0)
##  P xml2          1.2.2   2019-08-09 [?] CRAN (R 3.6.0)
##  P xtable        1.8-4   2019-04-21 [?] CRAN (R 3.6.0)
##    yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.0)
## 
## [1] /Users/nathanday/ROS/new_site/renv/library/R-3.6/x86_64-apple-darwin15.6.0
## [2] /private/var/folders/7_/cvjz84n54vx7zv_pw3gmdqr00000gn/T/Rtmp7Jgqh8/renv-system-library
## 
##  P ── Loaded and on-disk path mismatch.

An inputMap for your Shiny app

May 5, 2019 - 3 minutes
data viz leaflet shiny sf

Is the weather getting wetter?

February 20, 2019 - 3 minutes
Exploring historical data from 1905 to 2015 from the World Bank
weather lm sf tidyverse

Geocoded crime reports for Charlottesville Virginia

November 27, 2018 - 5 minutes
Civic Data packages sf tidyverse