Rocker-ing on MacOS

March 18, 2018 - 5 minutes
Open Data docker rstudio_server bash

<img src = “https://www.docker.com/sites/default/files/Whale%20Logo332_5.png”, alt = “Docker whales are friends!”, style=“display:block; margin-left:auto; margin-right:auto;”/>

Intro

TLDR

If you don’t know about Rocker yet, you should. It is slick collection of Docker images built specifically for the R community. They run RStudio Server out of the box and allow anyone to quickly spin up and share reproducible analysis environments.

The main motivation for this post is the Astrea Open Data Challenge, which requires a container as part of the result submission. Initially the challenge only had one option, a Python focused Anaconda image, so I reached out to the organizers and convinced them to add a comparable option for R users, the rocker/rstudio image.

Since I asked for the alternative R environment! I wanted to markdown the step by step instructions for getting rocker set up on macOS. This walk through assumes you have done the first two parts of tutorial Getting Started with Docker and have a working knowledge of git (nothing fancy).

Recomeneded: A GitHub repo for files

This part is separate but parallel to the Docker container set up. Docker is meant to contain all things computing environment but not really scripts and data. So it’s good practive to keep these in a separate folder system. That could be locally, but in name of version control and open-source, we are using GitHub to track and publish our code.

So if you want to do this part go to Github.com and make a new repository, I called mine “Hauncher”. Then get the Clone with HTTPS link and run the following line on your computer in a new Terminal:

git clone https://github.com/nathancday/hauncher.git ~/Desktop/hauncher

This will create a cloned copy of the new repository as a new directory “hauncher” sitting on your Desktop. This is where we will be saving all of the files we use inside of the Docker container.

Get the image/container

Images are the recipes that build out containers. Containers are the actual system process that runs the code stuff.

I highly recommend using the rocker/ropensci image over the base RStudio image because it has a bunch of the required system level libraries and common R packages pre-installed, which makes life a lot easier.

docker pull rocker/ropensci

This takes a minute the first time you pull so go get a cup of coffee or something.

If you really want to start from scratch and use the base image rocker/RStudio, you can install external dependencies in your container with the intructions here, but full warning its a PIA.

Run that image as a container

Copy and replace the -v /local/path:/container/path section and run the following in your terminal:

docker run -d -p 8787:8787 -v /Users/nathanday/Desktop/hauncher/:/home/rstudio/hauncher rocker/rstudio

Docker requires absolute paths, so you can abbreviate your home directed with ~/, it’s gotta be /Users/your_name/.

Now head to localhost:8787 in your favorite browser to see your new container in action.

You should be greeted with a login screen, enter “rstudio” for both the username and password. And wallah you in R Studio. And you should see two folders kitematic and hauncher in your file explorer pane.

Install packages / mod your container

Let’s make a new R script to install an extra packages we want and save it into our directory hauncher. My script is called install.R and is just this:

install.packages("pacman")

Save your container

Before you close the running container go back to the terminal and list which containers are running with:

docker ps

Find the container ID you want to save and copy the whole thing or remember the first few characters (two is usually good enough for me) and run this line, substituting your full or partial id for “54”.

docker commit -m "first commit; install.R" 54 nathancday/hauncher:latest

Note: The container ID will change each time you run an image, because its really a process ID not permanent attribute.

Using latest as the tag name is nice because, you won’t need to type out the :$TAG suffix for pull or push commands. But use other tag names for more desciptive saves.

Now run:

docker images

To check the available local images and you should see the new image you just saved.

Publish to Docker Hub

The last Docker thing to do is push your updated to image to Docker Hub, so other people can access it:

docker push nathancday/hauncher

This is considerably faster than the original pull for us, because we are only uploading the modifications.

Now anyone can start with exact same environment as us, with the same pull/run commands, isn’t that great?

Update GitHub

The final step for a totally synced, containerized, reproducible analysis is to update our remote GitHub repo with the file changes we made locally, here its just the new install.R file.

Back to the terminal we go:

git status

Should show our new file as “untracked”.

All we need to do it is add it, commit it, and push it.

git add install.R
git commit -m "install.R file"
git push

And that’s it!

Wrap up

I hope this helps you going with using Docker for reproducible projects. If you run into any problems, let me know I will do my best to help debug.

You are welcome to follow along with our project’s progress on GitHub repo and Docker image. We will be keeping our code open from start to finish, so updates are coming as we build our models in the next few weeks. Have fun rocker-ing!!!

TLDR

The code for you to rock right away:

docker run -d -p 8787:8787 -v /Users/you/Desktop/Dir:/home/rstudio/Dir rocker/ropensci
# run as daemon
# served at 'localhost:8787', use your browser
# mount local/dir:container/dir

If you’ve changed a running container and want to save it (and push to your Hub):

# lists runing containers, get CONTAINER_ID of interest
docker ps 

# commit kinda like git
docker commit CONTAINER_ID user_name/repo_name # default tag is 'latest'

# must be logged in to docker hub to push it
docker push user_name/repo_name

If you want to reproduce our code so far:

# clone our GitHub repo
git clone https://github.com/nathancday/hauncher.git /Local/Path/Desktop/hauncher

# run our Docker image
docker run -d -p 8787:8787 -v /Local/Path/Desktop/hauncher:/home/rstudio/hauncer nathancday/hauncher