TL;DR: New beta conda feature allows data scientists and others to describe project runtime requirements in a single file called kapsel.yml. Using kapsel.yml, conda will automatically reproduce prerequisites on any machine and then run the project. 

TL;DR: New beta conda feature allows data scientists and others to describe project runtime requirements in a single file called kapsel.yml. Using kapsel.yml, conda will automatically reproduce prerequisites on any machine and then run the project. 

Data scientists working with Python often create a project directory containing related analysis, notebook files, data-cleaning scripts, Bokeh visualizations, and so on. For a colleague who wants to replicate your project, or even for the original creator a few months later, it can be tricky to run all this code exactly as it was run the first time. 

Most code relies on some specific setup before it’s run — such as installing certain versions of packages, downloading data files, starting up database servers, configuring passwords, or configuring parameters to a model. 

You can write a long README file to manually record all these steps and hope that you got it right. Or, you could use conda kapsel. This new beta conda feature allows data scientists to list their setup and runtime requirements in a single file called kapsel.yml. Conda reads this file and performs all these steps automatically. With conda kapsel, your project just works for anyone you share it with.

Sharing your project with others

When you’ve shared your project directory (including a kapsel.yml) and a colleague types conda kapsel run in that directory, conda automatically creates a dedicated environment, puts the correct packages in it, downloads any needed data files, starts needed services, prompts the user for missing configuration values, and runs the right command from your project.

As with all things conda, there’s an emphasis on ease-of-use. It would be clunky to first manually set up a project, and then separately configure project requirements for automated setup. 

With the conda kapsel command, you set up and configure the project at the same time. For example, if you type conda kapsel add-packages bokeh=0.12, you’ll get Bokeh 0.12 in your project’s environment, and automatically record a requirement for Bokeh 0.12 in your kapsel.yml. This means there’s no extra work to make your project reproducible. Conda keeps track of your project setup for you, automatically making any project directory into a runnable, reproducible “conda kapsel.”

There’s nothing data-science-specific about conda kapsel; it’s a general-purpose feature, just like conda’s core package management features. But we believe conda kapsel’s simple approach to reproducibility will appeal to data scientists.

Try out conda kapsel

To understand conda kapsel, we recommend going through the tutorial. It’s a quick way to see what it is and learn how to use it. The tutorial includes installation instructions.

Where to send feedback

If you want to talk interactively about conda kapsel, give us some quick feedback, or run into any questions, join our chat room on Gitter. We would love to hear from you!

If you find a bug or have a suggestion, filing a GitHub issue is another great way to let us know.

If you want to have a look at the code, conda kapsel is on GitHub.

Next steps for conda kapsel

This is a first beta, so we expect conda kapsel to continue to evolve. Future directions will depend on the feedback you give us, but some of the ideas we have in mind:

  • Support for automating additional setup steps: What’s in your README that could be automated? Let us know!
  • Extensibility: We’d like to support both third-party plugins, and custom setup scripts embedded in projects.
  • UX refinement: We believe the tool can be even more intuitive and we’re currently exploring some changes to eliminate points of confusion early users have encountered. (We’d love to hear your experiences with the tutorial, especially if you found anything clunky or confusing.)

For the time being, the conda kapsel API and command line syntax are subject to change in future releases. A project created with the current “beta” version of conda kapsel may always need to be run with that version of conda kapsel and not conda kapsel 1.0. When we think things are solid, we’ll switch from “beta” to “1.0” and you’ll be able to rely on long-term interface stability.

We hope you find conda kapsel useful!