Anaconda integrates with many different providers and platforms to give you access to the data science libraries you love on the services you use, including Amazon Web Services, Microsoft Azure, and Cloudera CDH. Today we’re excited to announce our new partnership with Docker.

As part of the announcements at DockerCon this week, Anaconda images will be featured in the new Docker Store, including Anaconda and Miniconda images based on Python 2 and Python 3. These freely available Anaconda images for Docker are now verified, will be featured in the Docker Store when it launches, are being regularly scanned for security vulnerabilities and are available from the ContinuumIO organization on Docker Hub.

The Anaconda images for Docker make it easy to get started with Anaconda on any platform, and provide a flexible starting point for developing or deploying data science workflows with more than 100 of the most popular Open Data Science packages for Python and R, including data analysis, visualization, optimization, machine learning, text processing and more.

Whether you’re a developer, data scientist, or devops engineer, Anaconda and Docker can provide your entire data science team with a scalable, deployable and reproducible Open Data Science platform.

Use Cases with Anaconda and Docker

Anaconda and Docker are a great combination to empower your development, testing and deployment workflows with Open Data Science tools, including Python and R. Our users often ask whether they should be using Anaconda or Docker for data science development and deployment workflows. We suggest using both – they’re better together!

Anaconda’s sandboxed environments and Docker’s containerization complement each other to give you portable Open Data Science functionality when you need it – whether you’re working on a single machine, across a data science team or on a cluster.

Here are a few different ways that Anaconda and Docker make a great combination for data science development and deployment scenarios:

1) Quick and easy deployments with Anaconda

Anaconda and Docker can be used to quickly reproduce data science environments across different platforms. With a single command, you can quickly spin up a Docker container with Anaconda (and optionally with a Jupyter Notebook) and have access to 720+ of the most popular packages for Open Data Science, including Python and R.

2) Reproducible build and test environments with Anaconda

At Continuum, we’re using Docker to build packages and libraries for Anaconda. The build images are available from the ContinuumIO organization on Docker Hub (e.g., conda-builder-linux and centos5_gcc5_base). We also use Docker with continuous integration services, such as Travis CI, for automated testing of projects across different platforms and configurations (e.g., Dask.distributed and hdfs3).

Within the open-source Anaconda and conda community, Docker is also used for reproducible test and build environments. Conda-forge is a community-driven infrastructure for conda recipes that uses Docker with Travis CI and CircleCI to build, test and upload conda packages that include Python, R, C++ and Fortran libraries. The Docker images used in conda-forge are available from the conda-forge organization on Docker Hub.

3) Collaborative data science workflows with Anaconda

You can use Anaconda with Docker to build, containerize and share your data science applications with your team. Collaborative data science workflows with Anaconda and Docker make the transition from development to deployment as easy as sharing a Dockerfile and conda environment.

 

 

Once you’ve containerized your data science applications, you can use container clustering systems, such as Kubernetes or Docker Swarm, when you’re ready to productionize, deploy and scale out your data science applications for many users.

4) Endless combinations with Anaconda and Docker

The combined portability of Anaconda and flexibility of Docker enable a wide range of data science and analytics use cases.

A search for “Anaconda“ on Docker Hub shows many different ways that users are leveraging libraries from Anaconda with Docker, including turnkey deployments of Anaconda with Jupyter Notebooks; reproducible scientific research environments; and machine learning and deep learning applications with Anaconda, TensorFlow, Caffe and GPUs.

Using Anaconda Images with Docker

There are many ways to get started using the Anaconda images with Docker. First, choose one of the Anaconda images for Docker based on your project requirements. The Anaconda images include the default packages listed here, and the Miniconda images include a minimal installation of Python and conda.

continuumio/anaconda (based on Python 2.7)
continuumio/anaconda3 (based on Python 3.5)
continuumio/miniconda (based on Python 2.7)
continuumio/miniconda3 (based on Python 3.5)

For example, we can use the continuumio/anaconda3 image, which can be pulled from the Docker repository:

$ docker pull continuumio/anaconda3

Next, we can run the Anaconda image with Docker and start an interactive shell:

$ docker run -i -t continuumio/anaconda3 /bin/bash

Once the Docker container is running, we can start an interactive Python shell, install additional conda packages or run Python applications.

Alternatively, we can start a Jupyter Notebook server with Anaconda from a Docker image:

$ docker run -i -t -p 8888:8888 continuumio/anaconda3 /bin/bash -c "/opt/conda/bin/conda install jupyter -y --quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser"

You can then view the Jupyter Notebook by opening http://localhost:8888 in your browser, or http://<DOCKER-MACHINE-IP>:8888 if you are using a Docker Machine VM.

Once you are inside of the running notebook, you can import libraries from Anaconda, perform interactive computations and visualize your data.

Additional Resources for Anaconda and Docker

Anaconda and Docker complement each other and make working with Open Data Science development and deployments easy and scalable. For collaborative workflows, Anaconda and Docker provide everyone on your data science team with access to scalable, deployable and reproducible Open Data Science.

Get started with Anaconda with Docker by visiting ContinuumIO organization on Docker Hub. The Anaconda images will also be featured in the Docker Store when it launches.

Interested in using Anaconda and Docker in your organization for Open Data Science development, reproducibility and deployments? Get in touch with us if you’d like to learn more about how Anaconda can empower your enterprise with Open Data Science, including an on-premise package repository, collaborative notebooks, cluster deployments and custom consulting/training solutions.


About the Authors

Kristopher Overholt

Product Manager

Kristopher works with DevOps, product engineering and platform infrastructure for the Anaconda Enterprise data science platform. His areas of expertise include distributed systems, data engineering and computational science workflows.

Kr …

Read more


Christine Doig

Sr. Data Scientist, Product Manager

Christine is a Senior Data Scientist at Anaconda. She has over five years’ experience in analytics, operations research and machine learning in a variety of industries, including energy, manufacturing and banking. At Anaconda, she worked …

Read more

Join the Disucssion