Rocker-ing on MacOS

March 18, 2018 - 5 minutes
Cville Open Data docker rstudio_server bash

Rocker rocks!

TLDR

If you don’t know about Rocker yet, you are in for a treat. It is slick collection of Docker images built specifically for the R community. They run RStudio Server out of the box and allow novice Docker users, like myself, to quickly spin up and share reproducible analysis.

The main motivation for this post is the Astrea Open Data Challenge, which requires a container as part of the result submission. Initially the challenge only had one option, a Python focused Anaconda image, so I reached out to the organizers and convinced them to add a comparable option for R users, the rocker/rstudio image.

Since I asked for the alternative R environment! I wanted to markdown the step by step instructions for getting rocker set up on macOS. This walk through assumes you have done the first two parts of tutorial Getting Started with Docker and have a working knowledge of git (nothing fancy).

Recomeneded: A GitHub repo for files

This part is separate but parallel to the Docker container set up. Docker is meant to contain all things computing environment but not really scripts and data. So it’s good practive to keep these in a separate folder system. That could be locally, but in name of version control and open-source, we are using GitHub to track and publish our code.

So if you want to do this part go to Github.com and make a new repository, I called mine “Hauncher”. Then get the Clone with HTTPS link and run the following line on your computer in a new Terminal:

git clone https://github.com/nathancday/hauncher.git ~/Desktop/hauncher

This will create a cloned copy of the new repository as a new directory “hauncher” sitting on your Desktop. This is where we will be saving all of the files we use inside of the Docker container.

Get the image/container

Images are the recipes that build out containers. Containers are the actual system process that runs stuff.

I highly recommend using the rocker/ropensci image over the base RStudio image because it has a bunch of the required system level libraries and common R packages pre-installed, which will make your life easier.

docker pull rocker/ropensci

This takes a minute the first time you pull so go get a cup of coffee or something.

If you really want to start from scratch and use the basic rocker/RStudio, you can install external dependencies in your container with the intructions here, but full warning its a PIA.

Run that image as a container

Copy and replace the repo/path into your terminal:

docker run -d -p 8787:8787 -v /Users/nathanday/Desktop/hauncher/:/home/rstudio/hauncher rocker/rstudio

Now head to localhost:8787 in your favorite browser to see your new container in action.

You should be greeted with a login screen, enter “rstudio” for both the username and password. And wallah you in R Studio. And you should see two folders kitematic and hauncher in your file explorer pane.

Install packages / mod your container

Let’s make a new R script to install some extra packages we want and save it into our directory hauncher. My script is called install.R and is just this:

install.packages("pacman")

library(pacman)

p_load(caret, forecast, forcats)

I really like using library(pacman) to do installs, because it’s a smooth wrapper around install.packages() and devtools::install_github(). So we download and load pacman the conventional way. Then use one of its helper functions pacman::p_load() to install and load three more essential in one line.

Save your container

Before we close our container we want to make sure we save it, so all all of our newly installed packages are there next time we run it. To do this go back to the terminal and check which containers are running:

docker ps

Find the container ID you want to save and copy the whole thing or remember the first few character (two is usually good enough for me) and run this line substituting your full or partial id for “54”.

Note: The container ID will change each time you run the same image, because its really a process ID not permanent attribute.

docker commit -m "first commit; install.R" 54 nathancday/hauncher:latest

Using latest as the tag name is nice because it’s the default, meaning you won’t need to type it out for pull or push commands.

Now run:

docker images

To check your available local images and you will see your new image saved.

Publish to Docker Hub

The last Docker thing to do is push your updated to image to Docker Hub, so other people can access it:

docker push nathancday/hauncher

This is considerably faster than the original pull for us, because we are only uploading the modifications.

Now anyone can start with exact same environment as us, with the same pull/run commands, isn’t that great?

Update GitHub

The final step for a totally synced, containerized, reproducible analysis is to update our remote GitHub repo with the file changes we made locally, here its just the new install.R file.

Back to the terminal we go:

git status

Should show our new file as “untracked”.

All we need to do it is add it, commit it, and push it.

git add install.R
git commit -m "install.R file"
git push

And that’s it!

Wrap up

I hope this helps you going with using Docker for reproducible projects. If you run into any problems, let me know I will do my best to help debug.

You are welcome to follow along with our project’s progress on GitHub repo and Docker image. We will be keeping our code open from start to finish, so updates are coming as we build our models in the next few weeks. Have fun rocker-ing!!!

TLDR

The code for you to rock right away:

docker run -d -p 8787:8787 -v /Users/you/Desktop/Dir:/home/rstudio/Dir rocker/ropensci
# run as daemon
# served at 'localhost:8787', use your browser
# mount local/dir:container/dir

If you’ve changed a running container and want to save it (and push to your Hub):

# lists runing containers, get CONTAINER_ID of interest
docker ps 

# commit kinda like git
docker commit CONTAINER_ID user_name/repo_name # default tag is 'latest'

# must be logged in to docker hub to push it
docker push user_name/repo_name

If you want to reproduce our code so far:

# clone our GitHub repo
git clone https://github.com/nathancday/hauncher.git /Local/Path/Desktop/hauncher

# run our Docker image
docker run -d -p 8787:8787 -v /Local/Path/Desktop/hauncher:/home/rstudio/hauncer nathancday/hauncher