background

Jupyter and conda for R

tl; dr: We discuss the many benefits Jupyter, the IRKernel and conda can provide for data scientists working in R.

Jupyter, previously called IPython, is already widely adopted by data scientists, researchers, and analysts. Jupyter’s notebook user interface enables mixing executable code with narrative text, equations, interactive visualizations, and images to enhance team collaboration and advance the state of reproducible research and training. Jupyter began with Python and now has kernels for 50 different languages, and the IRKernel is the native R kernel for Jupyter.

tl; dr: We discuss the many benefits Jupyter, the IRKernel and conda can provide for data scientists working in R.

Jupyter, previously called IPython, is already widely adopted by data scientists, researchers, and analysts. Jupyter’s notebook user interface enables mixing executable code with narrative text, equations, interactive visualizations, and images to enhance team collaboration and advance the state of reproducible research and training. Jupyter began with Python and now has kernels for 50 different languages, and the IRKernel is the native R kernel for Jupyter.

Data scientists, researchers, and analysts use the conda package manager to install and organize project dependencies. With conda they can easily build and share metapackages, which are downloadable bundles of packages. Conda works with Linux, OS X, and Windows, and is language agnostic, so we can use it with any programming language and with projects that depend on multiple languages.

Let’s use conda and Jupyter to start a data science project in R.

“R Essentials” setup

The Anaconda team has created an “R Essentials” bundle with the IRKernel and over 80 of the most used R packages for data science, including dplyrshinyggplot2tidyr,caret and nnet.

Downloading “R Essentials” requires conda. Miniconda includes conda, Python, and a few other necessary packages, while Anaconda includes all this and over 200 of the most popularPython packages for science, math, engineering, and data analysis. Users may install all of Anaconda at once, or they may install Miniconda at first and then use conda to install any other packages they need, including any of the packages in Anaconda.

Once you have conda, you may install “R Essentials” into the current environment:

conda install -c r r-essentials

 

Bash

or create a new environment just for “R essentials”:

conda create -n my-r-env -c r r-essentials

 

Bash

Jupyter

Jupyter provides a great notebook interface to write your analysis and share it with your peers. Open a shell and run this command to start the Jupyter notebook interface in your browser:

jupyter notebook

 

Bash

Start a new R notebook:

create an R notebook with jupyter

You can immediately write and run R code in the notebook cells.

An R notebook example

Now you can:

  • import the data wrangling R package, dplyr:
In [1]: library(dplyr)

 

S
  • explore one of the available datasets, such as the iris:
In [2]: iris
 
Out[2]:
    Sepal.Length    Sepal.Width     Petal.Length    Petal.Width     Species
1            5.1            3.5              1.4            0.2      setosa
2            4.9              3              1.4            0.2      setosa
...

 

S
  • calculate the average sepal width by species:
In [3]: iris %>%
 group_by(Species) %>%
 summarise(Sepal.Width.Avg = mean(Sepal.Width)) %>%
 arrange(Sepal.Width.Avg)
 
Out [3]:
        Species     Sepal.Width.Avg
1    versicolor                2.77
2     virginica               2.974
3        setosa               3.428

 

S
  • import the visualization R library ggplot2:
In [4]: library(ggplot2)

 

S
  • plot the Sepal.Width vs. Sepal.Length
In [5]: ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3)

 

S

Sepal.Width vs. Sepal.Length

You can find this blogpost in notebook format here.

Creating your custom R bundle

For our users’ convenience, we have packaged some of the most used R packages for data science in “R Essentials”. It is also very easy to create your own custom set of R packages to share with peers with the conda metapackage command. For example, to provide a download called custom-r-bundle with only the libraries used in our example notebook, just create the metapackage:

conda metapackage custom-r-bundle 0.1.0 --dependencies r-irkernel jupyter r-ggplot2 r-dplyr --summary "My custom R bundle"

 

Bash

Share it with colleagues by uploading it to Anaconda.org:

conda install anaconda-client
anaconda login
anaconda upload path/to/custom-r-bundle-0.1.0-0.tar.bz2

 

Bash

Now, anyone can get all those packages and dependencies by running:

conda install -c <your anaconda.org username> custom-r-bundle

 

Bash

From notebook to slides

Jupyter can convert a notebook into an online slide deck for talks and tutorials.

To convert a notebook into a reveal.js presentation, set “Cell Toolbar” to “Slideshow”:

Convert R notebooks to slides step 1

Organize the cells into slides and subslides:

Convert R notebooks to slides step 2

And convert:

jupyter nbconvert my_r_notebook.ipynb --to slides --post serve

 

Bash

This opens a browser showing the slidedeck:

Convert R notebooks to slides

Summary

This blogpost explores how Jupyter can provide R users with a nice notebook interface to develop, narrate, and share data science projects in R. It is easy to get started, package, and track the necessary dependencies to replicate analyses and results with conda and “R essentials”.


About the Author

Christine Doig

Sr. Data Scientist, Product Manager

Christine is a Senior Data Scientist at Anaconda. She has over five years’ experience in analytics, operations research and machine learning in a variety of industries, including energy, manufacturing and banking. At Anaconda, she worked …

Read more

Join the Disucssion