If you don’t know about Rocker yet, you are in for a treat. It is slick collection of Docker images built specifically for the R community. They run RStudio Server out of the box and allow novice Docker users, like myself, to quickly spin up and share reproducible analysis.
The main motivation for this post is the Astrea Open Data Challenge, which requires a container as part of the result submission. Initially the challenge only had one option, a Python focused Anaconda image, so I reached out to the organizers and convinced them to add a comparable option for R users, the rocker/rstudio image.
Since I asked for the alternative R environment! I wanted to markdown the step by step instructions for getting rocker set up on
macOS. This walk through assumes you have done the first two parts of tutorial Getting Started with Docker and have a working knowledge of
git (nothing fancy).
Recomeneded: A GitHub repo for files
This part is separate but parallel to the Docker container set up. Docker is meant to contain all things computing environment but not really scripts and data. So it’s good practive to keep these in a separate folder system. That could be locally, but in name of version control and open-source, we are using GitHub to track and publish our code.
So if you want to do this part go to Github.com and make a new repository, I called mine “Hauncher”. Then get the
Clone with HTTPS link and run the following line on your computer in a new Terminal:
git clone https://github.com/nathancday/hauncher.git ~/Desktop/hauncher
This will create a cloned copy of the new repository as a new directory “hauncher” sitting on your Desktop. This is where we will be saving all of the files we use inside of the Docker container.
Get the image/container
Images are the recipes that build out containers. Containers are the actual system process that runs stuff.
I highly recommend using the
rocker/ropensci image over the base
RStudio image because it has a bunch of the required system level libraries and common R packages pre-installed, which will make your life easier.
docker pull rocker/ropensci
This takes a minute the first time you
pull so go get a cup of coffee or something.
If you really want to start from scratch and use the basic
rocker/RStudio, you can install external dependencies in your container with the intructions here, but full warning its a PIA.
Run that image as a container
Copy and replace the repo/path into your terminal:
docker run -d -p 8787:8787 -v /Users/nathanday/Desktop/hauncher/:/home/rstudio/hauncher rocker/rstudio
Now head to localhost:8787 in your favorite browser to see your new container in action.
You should be greeted with a login screen, enter “rstudio” for both the username and password. And wallah you in R Studio. And you should see two folders
hauncher in your file explorer pane.
Install packages / mod your container
Let’s make a new R script to install some extra packages we want and save it into our directory
hauncher. My script is called
install.R and is just this:
install.packages("pacman") library(pacman) p_load(caret, forecast, forcats)
I really like using
library(pacman) to do installs, because it’s a smooth wrapper around
devtools::install_github(). So we download and load
pacman the conventional way. Then use one of its helper functions
pacman::p_load() to install and load three more essential in one line.
Save your container
Before we close our container we want to make sure we save it, so all all of our newly installed packages are there next time we run it. To do this go back to the terminal and check which containers are running:
Find the container ID you want to save and copy the whole thing or remember the first few character (two is usually good enough for me) and run this line substituting your full or partial id for “54”.
Note: The container ID will change each time you run the same image, because its really a process ID not permanent attribute.
docker commit -m "first commit; install.R" 54 nathancday/hauncher:latest
latest as the tag name is nice because it’s the default, meaning you won’t need to type it out for
To check your available local images and you will see your new image saved.
Publish to Docker Hub
The last Docker thing to do is push your updated to image to Docker Hub, so other people can access it:
docker push nathancday/hauncher
This is considerably faster than the original
pull for us, because we are only uploading the modifications.
Now anyone can start with exact same environment as us, with the same
run commands, isn’t that great?
The final step for a totally synced, containerized, reproducible analysis is to update our remote GitHub repo with the file changes we made locally, here its just the new
Back to the terminal we go:
Should show our new file as “untracked”.
All we need to do it is add it, commit it, and push it.
git add install.R git commit -m "install.R file" git push
And that’s it!
I hope this helps you going with using Docker for reproducible projects. If you run into any problems, let me know I will do my best to help debug.
You are welcome to follow along with our project’s progress on GitHub repo and Docker image. We will be keeping our code open from start to finish, so updates are coming as we build our models in the next few weeks. Have fun rocker-ing!!!
The code for you to rock right away:
docker run -d -p 8787:8787 -v /Users/you/Desktop/Dir:/home/rstudio/Dir rocker/ropensci # run as daemon # served at 'localhost:8787', use your browser # mount local/dir:container/dir
If you’ve changed a running container and want to save it (and push to your Hub):
# lists runing containers, get CONTAINER_ID of interest docker ps # commit kinda like git docker commit CONTAINER_ID user_name/repo_name # default tag is 'latest' # must be logged in to docker hub to push it docker push user_name/repo_name
If you want to reproduce our code so far:
# clone our GitHub repo git clone https://github.com/nathancday/hauncher.git /Local/Path/Desktop/hauncher # run our Docker image docker run -d -p 8787:8787 -v /Local/Path/Desktop/hauncher:/home/rstudio/hauncer nathancday/hauncher