<img src = “https://www.docker.com/sites/default/files/Whale%20Logo332_5.png”, alt = “Docker whales are friends!”, style=“display:block; margin-left:auto; margin-right:auto;”/>
If you don’t know about Rocker yet, you should. It is slick collection of Docker images built specifically for the R community. They run RStudio Server out of the box and allow anyone to quickly spin up and share reproducible analysis environments.
The main motivation for this post is the Astrea Open Data Challenge, which requires a container as part of the result submission. Initially the challenge only had one option, a Python focused Anaconda image, so I reached out to the organizers and convinced them to add a comparable option for R users, the rocker/rstudio image.
Since I asked for the alternative R environment! I wanted to markdown the step by step instructions for getting rocker set up on
macOS. This walk through assumes you have done the first two parts of tutorial Getting Started with Docker and have a working knowledge of
git (nothing fancy).
Recomeneded: A GitHub repo for files
This part is separate but parallel to the Docker container set up. Docker is meant to contain all things computing environment but not really scripts and data. So it’s good practive to keep these in a separate folder system. That could be locally, but in name of version control and open-source, we are using GitHub to track and publish our code.
So if you want to do this part go to Github.com and make a new repository, I called mine “Hauncher”. Then get the
Clone with HTTPS link and run the following line on your computer in a new Terminal:
git clone https://github.com/nathancday/hauncher.git ~/Desktop/hauncher
This will create a cloned copy of the new repository as a new directory “hauncher” sitting on your Desktop. This is where we will be saving all of the files we use inside of the Docker container.
Get the image/container
Images are the recipes that build out containers. Containers are the actual system process that runs the code stuff.
I highly recommend using the
rocker/ropensci image over the base
RStudio image because it has a bunch of the required system level libraries and common R packages pre-installed, which makes life a lot easier.
docker pull rocker/ropensci
This takes a minute the first time you
pull so go get a cup of coffee or something.
If you really want to start from scratch and use the base image
rocker/RStudio, you can install external dependencies in your container with the intructions here, but full warning its a PIA.
Run that image as a container
Copy and replace the
-v /local/path:/container/path section and run the following in your terminal:
docker run -d -p 8787:8787 -v /Users/nathanday/Desktop/hauncher/:/home/rstudio/hauncher rocker/rstudio
Docker requires absolute paths, so you can abbreviate your home directed with
~/, it’s gotta be
Now head to localhost:8787 in your favorite browser to see your new container in action.
You should be greeted with a login screen, enter “rstudio” for both the username and password. And wallah you in R Studio. And you should see two folders
hauncher in your file explorer pane.
Install packages / mod your container
Let’s make a new R script to install an extra packages we want and save it into our directory
hauncher. My script is called
install.R and is just this:
Save your container
Before you close the running container go back to the terminal and list which containers are running with:
Find the container ID you want to save and copy the whole thing or remember the first few characters (two is usually good enough for me) and run this line, substituting your full or partial id for “54”.
docker commit -m "first commit; install.R" 54 nathancday/hauncher:latest
Note: The container ID will change each time you run an image, because its really a process ID not permanent attribute.
latest as the tag name is nice because, you won’t need to type out the
:$TAG suffix for
push commands. But use other tag names for more desciptive saves.
To check the available local images and you should see the new image you just saved.
Publish to Docker Hub
The last Docker thing to do is push your updated to image to Docker Hub, so other people can access it:
docker push nathancday/hauncher
This is considerably faster than the original
pull for us, because we are only uploading the modifications.
Now anyone can start with exact same environment as us, with the same
run commands, isn’t that great?
The final step for a totally synced, containerized, reproducible analysis is to update our remote GitHub repo with the file changes we made locally, here its just the new
Back to the terminal we go:
Should show our new file as “untracked”.
All we need to do it is add it, commit it, and push it.
git add install.R git commit -m "install.R file" git push
And that’s it!
I hope this helps you going with using Docker for reproducible projects. If you run into any problems, let me know I will do my best to help debug.
You are welcome to follow along with our project’s progress on GitHub repo and Docker image. We will be keeping our code open from start to finish, so updates are coming as we build our models in the next few weeks. Have fun rocker-ing!!!
The code for you to rock right away:
docker run -d -p 8787:8787 -v /Users/you/Desktop/Dir:/home/rstudio/Dir rocker/ropensci # run as daemon # served at 'localhost:8787', use your browser # mount local/dir:container/dir
If you’ve changed a running container and want to save it (and push to your Hub):
# lists runing containers, get CONTAINER_ID of interest docker ps # commit kinda like git docker commit CONTAINER_ID user_name/repo_name # default tag is 'latest' # must be logged in to docker hub to push it docker push user_name/repo_name
If you want to reproduce our code so far:
# clone our GitHub repo git clone https://github.com/nathancday/hauncher.git /Local/Path/Desktop/hauncher # run our Docker image docker run -d -p 8787:8787 -v /Local/Path/Desktop/hauncher:/home/rstudio/hauncer nathancday/hauncher