One year ago, we presented Anaconda and Docker: Better Together for Reproducible Data Science. In that blog post, we described our vision and a foundational approach to portable and reproducible data science using Anaconda and Docker.

In this blog post, we’ll be diving deeper into how we’ve created a standard data science project encapsulation approach that helps data scientists deploy secure, scalable and reproducible projects across an entire team with Anaconda.

 

One year ago, we presented Anaconda and Docker: Better Together for Reproducible Data Science. In that blog post, we described our vision and a foundational approach to portable and reproducible data science using Anaconda and Docker.

This approach embraced the philosophy of Open Data Science in which data scientists can connect the powerful data science experience of Anaconda with the tools that they know and love, which today includes Jupyter notebooks, machine learning frameworks, data analysis libraries, big data computations and connectivity, visualization toolkits, high-performance numerical libraries and more.

We also discussed how data scientists could use Anaconda to develop data science analyses on their local machine, then use Docker to deploy those same data science analyses into production. This was the state of data science encapsulation and deployment that we presented last year:

In this blog post, we’ll be diving deeper into how we’ve created a standard data science project encapsulation approach that helps data scientists deploy secure, scalable and reproducible projects across an entire team with Anaconda.

This blog post also provides more details about how we’re using Anaconda and Docker for encapsulation and containerization of data science projects to power the data science deployment functionality in the next generation of Anaconda Enterprise, which augments our truly end-to-end data science platform.

Supercharge Your Data Science with More Than Just Dockerfiles!

The reality is, as much as Docker is loved and used by the DevOps community, it is not the preferred tool or entrypoint for data scientists looking to deploy their applications. Using Docker alone as a data science encapsulation strategy still requires coordination with their IT and DevOps teams to write their Dockerfiles, install the required system libraries in their containers, and orchestrate and deploy their Docker containers into production.

Having data scientists worry about infrastructure details and DevOps tooling takes away time from their most valuable skills: finding insights in data, modeling and running experiments, and delivering consumable data-driven applications to their team and end-users.

Data scientists enjoy using the packages they know and love with Anaconda along with conda environments, and wish it was as easy to deploy data science projects as it is to get Anaconda running in their laptop.

By working directly with our amazing customers and users and listening to the needs of their data science teams over the last five years, we have clearly identified how Anaconda and Docker can be used together for data science project encapsulation and as a more useful abstraction layer for data scientists: Anaconda Projects.

The Next Generation of Portable and Reproducible Data Science with Anaconda

As part of the next generation of data science encapsulation, reproducibility and deployment, we are happy to announce the release of Anaconda Project with the latest release of Anaconda! Download the latest version of Anaconda 4.3.1 to get started with Anaconda Project today.

Or, if you already have Anaconda, you can install Anaconda Project using the following command:

conda install anaconda-project

Anaconda Project makes it easy to encapsulate data science projects and makes them fully portable and deployment-ready. It automates the configuration and setup of data science projects, such as installing the necessary packages and dependencies, downloading data sets and required files, setting environment variables for credentials or runtime configuration, and running commands.

Anaconda Project is an open source tool created by Continuum Analytics that delivers light-weight, efficient encapsulation and portability of data science projects. Learn more by checking out the Anaconda Project documentation.

Anaconda Project makes it easy to reproduce your data science analyses, share data science projects with others, run projects across different platforms, or deploy data science applications with a single-click in Anaconda Enterprise.

Whether you’re running a project locally or deploying a project with Anaconda Enterprise, you are using the same project encapsulation standard: an Anaconda Project. We’re bringing you the next generation of true Open Data Science deployment in 2017 with Anaconda:

New Release of Anaconda Navigator with Support for Anaconda Projects

As part of this release of Anaconda Project, we’ve integrated easy data science project creation and encapsulation to the familiar Anaconda Navigator experience, which is a graphical interface for your Anaconda environments and data science tools. You can easily create, edit, and upload Anaconda Projects to Anaconda Cloud through a graphical interface.

Download the latest version of Anaconda 4.3.1 to get started with Anaconda Navigator and Anaconda Project today.

Or, if you already have Anaconda, you can install the latest version of Anaconda Navigator using the following command:

conda install anaconda-navigator

When you’re using Anaconda Project with Navigator, you can create a new project and specify its dependencies, or you can import an existing conda environment file (environment.yaml) or pip requirements file (requirements.txt).

Anaconda Project examples:

  • Image classifier web application using Tensorflow and Flask
  • Live Python and R notebooks that retrieve the latest stock market data
  • Interactive Bokeh and Shiny applications for data clustering, cross filtering, and data exploration
  • Interactive visualizations of data sets with Bokeh, including streaming data
  • Machine learning models with REST APIs

To get started even quicker with portable data science projects, refer to the example Anaconda Projects on Anaconda Cloud.

Deploying Secure and Scalable Data Science Projects with Anaconda Enterprise

The new data science deployment and collaboration functionality in Anaconda Enterprise leverages Anaconda Project plus industry-standard containerization with Docker and enterprise-ready container orchestration technology with Kubernetes.

This productionization and deployment strategy makes it easy to create and deploy data science projects with a single-click for projects that use Python 2, Python 3, R, (including their dependencies in C++, Fortran, Java, etc.) or anything else you can build with the 730+ packages in Anaconda.

From Data Science Development to Deployment with Anaconda Projects and Anaconda Enterprise

All of this is possible without having to edit Dockerfiles directly, install system packages in your Docker containers, or manually deploy Docker containers into production. Anaconda Enterprise handles all of that for you, so you can get back to doing data science analysis.

The result is that any project that a data scientist can create on their machine with Anaconda can be deployed to an Anaconda Enterprise cluster in a secure, scalable, and highly-available manner with just a single click, including live notebooks, interactive applications, machine learning models with REST APIs, or any other projects that leverage the 730+ packages in Anaconda.

Anaconda is such a foundational and ubiquitous data science platform that other lightweight data science workspaces and workbenches are using Anaconda as a necessary core component for their portable and reproducible data science. Anaconda is the leading Open Data Science platform powered by Python and empowers data scientists with a truly integrated experience and support for end-to-end workflows. Why would you want your data science team using Anaconda in production with anything other than Anaconda Enterprise?

Anaconda Enterprise is a true end-to-end data science platform that integrates with all of the most popular tools and platforms and provides your data science team with an on-premises package repository, secure enterprise notebook collaboration, data science and analytics on Hadoop/Spark, and secure and scalable data science deployment.

Anaconda Enterprise also includes support for all of the 730+ Open Data Science packages in Anaconda. Finally, Anaconda Scale is the only recommended and certified method for deploying Anaconda to a Hadoop cluster for PySpark or SparkR jobs.

Getting Started with Anaconda Enterprise and Anaconda Projects

Anaconda Enterprise uses Anaconda Project and Docker as its standard project encapsulation and deployment format to enable simple one-click deployments of secure and scalable data science applications for your entire data science team.

Are you interested in using Anaconda Enterprise in your organization to deploy data science projects, including live notebooks, machine learning models, dashboards, and interactive applications?

Access to the next generation of Anaconda Enterprise v5, which features one-click secure and scalable data science deployments, is now available as a technical preview as part of the Anaconda Enterprise Innovator Program.

Join the Anaconda Enterprise v5 Innovator Program today to discover the powerful data science deployment capabilities for yourself. Anaconda Enterprise handles your secure and scalable data science project encapsulation and deployment requirements so that your data science team can focus on data exploration and analysis workflows and spend less time worrying about infrastructure and DevOps tooling.


About the Authors

Christine Doig

Sr. Data Scientist, Product Manager

Christine is a Senior Data Scientist at Anaconda. She has over five years’ experience in analytics, operations research and machine learning in a variety of industries, including energy, manufacturing and banking. At Anaconda, she worked …

Read more


Kristopher Overholt

Product Manager

Kristopher works with DevOps, product engineering and platform infrastructure for the Anaconda Enterprise data science platform. His areas of expertise include distributed systems, data engineering and computational science workflows.

Kr …

Read more

Join the Disucssion