For all the many strengths of Python, people often express frustration in finding, building, and installing third party packages. This pain can be especially acute with scientific and analytical libraries, which are often a mix of Python with compiled, platform-dependent C, C++, or Fortran code. One day, tools like PyPy and Numba may rescue us from this state of affairs, but data scientists working today need solutions—today. When we began building Wakari, a cloud-based platform for shareable, reproducible analytics, we also experienced this pain. Our users need to work with different versions of Python, NumPy, SciPy, and a variety of other packages. Moreover, they must be able to easily share live, runnable versions of their work, including all supporting packages, to their colleagues or the general public.

If you had to pick one Python weakness…

For all the many strengths of Python, people often express frustration in finding, building, and installing third party packages. This pain can be especially acute with scientific and analytical libraries, which are often a mix of Python with compiled, platform-dependent C, C++, or Fortran code. One day, tools like PyPy and Numba may rescue us from this state of affairs, but data scientists working today need solutions—today. When we began building Wakari, a cloud-based platform for shareable, reproducible analytics, we also experienced this pain. Our users need to work with different versions of Python, NumPy, SciPy, and a variety of other packages. Moreover, they must be able to easily share live, runnable versions of their work, including all supporting packages, to their colleagues or the general public.

We created the conda package and environment management system to solve these problems. It allows users to install multiple versions of binary packages (and any required libraries) appropriate for their platform and easily switch between them, as well as easily download updates from an upstream repository. Continuum hosts a number of repository channels that provide many free open source packages (as well commercial channels for distributing commercial packages). It’s also possible for conda users to host their own channels, so that they may pull their own packages easily into conda environments. Think of it as git branching for site-packages, combined with yum for Python packages. Because we found conda to be so useful for managing packages in Anaconda and Wakari, we have open-sourced it, so that others might benefit from it as well.

Having been involved in the python world for so long, we are all aware of pip, easy_install, and virtualenv, but these tools did not meet all of our specific requirements. The main problem is that they are focused around Python, neglecting non-Python library dependencies, such as HDF5, MKL, LLVM, etc., which do not have a setup.py in their source code and also do not install files into Python’s site-packages directory.

Under the hood, we have created a concept of environments which are conceptually similar to virtualenvs, but which use filesystem-level hard links to create entirely self-contained Python runtime layouts. By using the ‘conda’ command line tool, users can easily switch between environments, create environments, and install different versions of libraries and modules into them.

The conda documentation is available online, and contains a complete command reference as well as a number of examples. With the release of conda 1.3, we’d like to show off some of conda’s more important features and provide a helpful “Getting Started” guide as a blog post as well. Let’s take a look at some common scenarios!

Downloading and Installing Packages

The primary use case of conda is for managing packages and their dependencies in a platform independent fashion. Let’s see how conda can help us find and install packages we are interested in. First, let’s look at some information about our conda setup:

$ conda info
  Current Anaconda install:

               platform : osx-64
  conda command version : 1.3.2
         root directory : /Users/maggie/anaconda
         default prefix : /Users/maggie/anaconda
           channel URLS : ['http://repo.continuum.io/pkgs/free/osx-64/']
  environment locations : ['/Users/maggie/anaconda/envs']

If the package is in one of the repository channels we have configured, then installing a package is as simple as using the conda install command. We can install packages into different environments, but if you don’t otherwise specify, conda will install packages into the default Anaconda environment. We’ll visit creating new environments a littler later, but let’s start with some examples with the default environment. Let’s say there is a different version of matplotlib we wish to try out:

$ conda install matplotlib=1.2

Package plan for installation in environment /Users/bryan/anaconda13:

The following packages will be DE-activated:

    package                    |  build
    -------------------------  |  ---------------
    matplotlib-1.1.1           |       np17py27_2

The following packages will be activated:

    package                    |  build
    -------------------------  |  ---------------
    matplotlib-1.2.0           |       np17py27_0

Proceed (y/n)?

Of course, if we just want to update to the latest newer version of a package that is compatible with other currently installed packages we can often just use the conda update command:

$ conda update matplotlib
Updating Anaconda environment at /Users/bryan/anaconda13

The following packages will be DE-activated:

    package                    |  build
    -------------------------  |  ---------------
    matplotlib-1.1.1           |       np17py27_2

The following packages will be activated:

    package                    |  build
    -------------------------  |  ---------------
    matplotlib-1.2.0           |       np17py27_0

Proceed (y/n)?

If there are packages that we want that are in other additional channels, we can add those channels in our condarc file. Let’s look at the default condarc file:

# channel locations. These override conda defaults, i.e., conda will
# search only the channels listed here, in the order given.
 channels:
#  - http://repo.continuum.io/pkgs/dev
#  - http://repo.continuum.io/pkgs/gpl
#  - http://repo.continuum.io/pkgs/pro
    - http://repo.continuum.io/pkgs/free

If we would like to allow GPL licensed packages to be installed into our Anaconda environments, we can simply uncomment the line with the “gpl” channel. Afterwards, this new channel will show up in our conda info output:

$ conda info

 

Current Anaconda install:

         platform : osx-64

conda command version : 1.3.3-5-g0c5b033-dirty root directory : /Users/maggie/anaconda default prefix : /Users/maggie/anaconda channel URLS : [‘http://repo.continuum.io/pkgs/gpl/osx-64/’, ‘http://repo.continuum.io/pkgs/free/osx-64/’] environment locations : [‘/Users/bryan/anaconda/envs’]

 

Now we can install, for instance, the GPL licensed rope library:

$ conda install rope

Package plan for installation in environment /Users/bryan/anaconda:

The following packages will be downloaded:

    rope-0.9.4-py27_g0.tar.bz2 [http://repo.continuum.io/pkgs/gpl/osx-64/]

The following packages will be activated:

    package                    |  build
    -------------------------  |  ---------------
    rope-0.9.4                 |          py27_g0

Proceed (y/n)?

It’s also possible to explicitly supply a package file to install.

$ conda install ~/redis-py-2.7.2-py27_0.tar.bz2
redis-py-2.7.2-py27_0:
    already available - removing
    making available
    activating

This is a bit lower level, but can be useful if you have your own package files to install (we will talk about creating your own packages a bit later).

Creating and Using Environments

Let’s look a bit into creating new Anaconda environments. At the core, Anaconda environments are just like directories that contain particular versions of packages. These can be located anywhere, but if they are within the Anaconda installation directory, conda will know about them. Let’s take a look:

$ conda info -e
    Known Anaconda environments:

        /Users/maggie/anaconda

A fresh install, there is just the default environment. Now we’d like to create some new environments. Maybe we have some existing libraries that perform some interesting analysis, and we’d like to test and compare our library with NumPy 1.6 and also the upcoming NumPy 1.7 release. Let’s see what versions of NumPy are available on our known package channels:

$ conda search —all numpy

8 matches found:

   package: numpy-1.7.0rc1   filename: numpy-1.7.0rc1-py27_0.tar.bz2
       md5: 6342d2aac738f158c4f3bce630b4e829

   package: numpy-1.5.1   filename: numpy-1.5.1-py26_0.tar.bz2
       md5: e37bf3bd755e40ef1a21c0bc6a493637

   package: numpy-1.6.2   filename: numpy-1.6.2-py26_0.tar.bz2
       md5: b8324e8695988ef59ef1da1ab1cf255f

   package: numpy-1.5.1   filename: numpy-1.5.1-py27_0.tar.bz2
       md5: aec83b6d825690a086e2e20d0cec8c38

   package: numpy-1.7.0b2   filename: numpy-1.7.0b2-py26_0.tar.bz2
       md5: 932892ca2929e04be6ebf548bf5e1e51

   package: numpy-1.7.0b2   filename: numpy-1.7.0b2-py27_0.tar.bz2
       md5: 467e6f9999dc8680270b625b46700e70

   package: numpy-1.6.2   filename: numpy-1.6.2-py27_0.tar.bz2
       md5: 1160f777c6b2fb9364bc701323bc6637

   package: numpy-1.7.0rc1   filename: numpy-1.7.0rc1-py26_0.tar.bz2
       md5: 676a7058df6543116f58d1d15268fe6d

 

 

We see there are packages for both versions of NumPy. Let’s keep things simple and create environments with the anaconda meta-package (which will install lots of packages in one go), but simply specify the version of NumPy we want in each. Let’s create an environment with NumPy 1.6 (click on the commands to expand their output):

$ conda create -n np1.6 anaconda numpy=1.6

 

 

 

And next, let’s create an environment for NumPy 1.7:

$ conda create -n np1.7 anaconda numpy=1.7

 

 

 

We can list these new environments using the conda ‘info’ command:

$ conda info -e
  Known Anaconda environments:

      /Users/maggie/anaconda
      /Users/maggie/anaconda/envs/np1.6
      /Users/maggie/anaconda/envs/np1.7

To use the python version together with all the packages installed in a given environment, simply run the python executable form that environment. From a bash shell:

$ ~/anaconda/envs/myenv/bin/python
Python 2.7.3 |AnacondaCE 1.3.0 (x86_64)| (default, Jan 10 2013, 12:10:41)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

From a Windows command window:

> C:Anacondaenvsmyenvpython.exe
C:Windowssystem32>c:Anacondaenvstestpython.exe
Python 2.7.3 |Continuum Analytics, Inc.| (default, Jn  7 2013, 09:47:12) [MSC
 v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

If we’d like to make one of these environments the “default”, we simply need to set ourPATH appropriately. From a bash shell:

$ export PATH=~/anaconda/envs/myenv/bin:$PATH

From a Windows command window:

> set PATH=C:AnacondaenvsmyenvScripts;%PATH%

Sometimes we don’t want to create environments with all the packages that the anaconda meta-package brings in. Maybe we want to do some testing in a minimal environment, and conda lets us create those, too. Let’s say we want to create an environment with scikit-learn and its dependencies, but nothing else. First, let’s see what versions of scikit-learn are available:

$ conda search —all scikit-learn

 

 

 

There are quite a few! By default, conda will install the latest compatible version, so we will just do that. But before we create an environment, let’s take a look at what the dependencies of scikit-learn are:

$ conda depends scikit-learn

scikit-learn depends on the following packages:
    nose-1.2.1
    numpy-1.7.0rc1
    python-2.7.3
    readline-6.2
    scipy-0.11.0
    sqlite-3.7.13
    tk-8.5.13
    zlib-1.2.7

We don’t need to specify all these dependencies ourselves, we’ll let conda do that work:

$ conda create -n test scikit-learn

Package plan for creating environment at /Users/maggie/anaconda/envs/test:

The following packages will be activated:

    package                    |  build
    -------------------------  |  ---------------
    nose-1.2.1                 |           py27_0
    numpy-1.7.0rc1             |           py27_0
    python-2.7.3               |                6
    readline-6.2               |                0
    scikit-learn-0.12.1        |       np17py27_0
    scipy-0.11.0               |       np17py27_1
    sqlite-3.7.13              |                0
    tk-8.5.13                  |                0
    zlib-1.2.7                 |                0


Proceed (y/n)? y

Activating packages...

[      COMPLETE      ] |######################################| 100%

 

 

It got all of the packages itself. Great!

Rolling your own packages

The content of this section is outdated. If you are interested in creating your own packages, you should read this newer conda blog post.

Conda allows you to create your own packages, i.e. packages which can be installed using the conda command and added to a conda package repository. As an example, we demonstrate how to build the pyephem package, create a repository for it, and install it into an existing Anaconda installation on a different system.

Whenever a conda package is installed, the information about which files belong to the package is also stored (as part of the conda install metadata in <sys.prefix>/conda-meta/). Therefore, it is possible to determine which files have been installed into a prefix manually (not using the conda command). A fresh installation of Anaconda reveals that no such files exist:

$ conda package --untracked
prefix: /home/ilan/a13

After downloading and extracting the pyephem source code, we do:

$ python setup.py install
running install
running build
...
$ conda package --untracked
prefix: /home/ilan/a13
lib/python2.7/site-packages/ephem/__init__.py
lib/python2.7/site-packages/ephem/__init__.pyc
lib/python2.7/site-packages/ephem/_libastro.so
...

We can now use the package command to bundle up the untracked files into a conda package:

$ conda package --pkg-name=pyephem --pkg-version=3.7.5.1
prefix: /home/ilan/a13
Number of files: 76
pyephem-3.7.5.1-py27_0.tar.bz2 created successfully

Note that conda is not limited to creating Python packages, you can basically install any type of package into the prefix and bundle it into a conda package, e.g. using ./configure --prefix=/home/ilan/a13; make; make install.

All of the above (including retrieving the pyephem source code) can be done using conda pip pyephem, which basically calls out to pip to do the source installation and then creates the conda package.

Creating your own package repository

Having successfully created a conda package, we now want to create a repository, such that others can easily install pyephem into their Anaconda installation. A conda repository is simply a directory of conda packages plus a conda index file. So, we create a new directory with the newly created conda package, and run:

$ conda index
updating index in: /home/ilan/conda-repo/linux-64
updating: pyephem-3.7.5.1-py27_0.tar.bz2
$ ls -l
total 896
-rw-r--r-- 1 ilan users 908801 Jan 24 17:24 pyephem-3.7.5.1-py27_0.tar.bz2
-rw-r--r-- 1 ilan users    313 Jan 24 17:37 repodata.json
-rw-r--r-- 1 ilan users    230 Jan 24 17:37 repodata.json.bz2

The file repodata.json.bz2 is used by the conda install command to detect which packages are available in a given conda repository. We now make this repository available over HTTP, and tell people who wish to access the repository to add it to their ~/.condarcfile. When we serve the above directory on http://localhost/conda-repo/linux-64/, the following URL needs to be added to the ~/.condarc file:

channels:
  - http://localhost/conda-repo

Note that the channel URL does not include the platform specific sub-directory (this way the same configuration file may be shared across platforms). Now we can install the pyephem package into another Anaconda system:

$ conda install pyephem
Package plan for installation in environment /home/ilan/a121:
The following packages will be downloaded:

    pyephem-3.7.5.1-py27_0.tar.bz2 [http://localhost/conda-repo/linux-64/]

The following packages will be activated:

    package                    |  build
    -------------------------  |  ---------------
    pyephem-3.7.5.1            |           py27_0

Proceed (y/n)? y
...
$ python
Python 2.7.3 |Anaconda 1.3.0 (64-bit)| (default, Jan 22 2013, 14:14:25)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ephem
>>> mars = ephem.Mars()
>>> mars.compute('2008/1/1')
>>> print mars.ra, mars.dec
5:59:27.35 26:56:27.4

It installed the package, and it appears to be working. We should mention that this package was built for 64-bit Linux and will not work on other systems, such as 32-bit Linux, MacOSX or Windows, as it contains platform specific C extensions which are linked (during import time) to the Python process.

Future directions

A standard refrain regarding package management in python is that it is “an active topic.” In fact, there are some recent enhancement proposals covering related areas, including: Package Metadata (PEP 345), Package DataBases (PEP 376), Standardized Package Version Numbers (PEP 386) and the Wheel PEP (PEP 427). As it happens, some of the ideas in these PEPs are already reflected within conda. We intend to watch the evolution and development of these proposals to make conda compatible and interoperable with whatever standard comes out of the enhancement process.

We should also note that conda were created originally to solve problems we had on linux backend platforms. However, we quickly realized that it could be valuable on Windows platforms as well. Conda already works well in Windows, but there are still a few areas where it could behave more like native Windows applications. Improving Windows integration is another priority for us.

You can check out conda in action at Wakari, or by installing Anaconda. You can also follow and contribute to conda development at the conda GitHub page.


About the Authors

Mr. Van de Ven received undergraduate degrees in Computer Science and Mathematics from UT Austin, and a Master’s degree in physics from UCLA. He has worked at the Applied Research Labs, developing software for sonar feature detection and cl …

Read more


Maggie Mari has been with the Anaconda Global Inc. team for over 6 years.

Read more

Join the Disucssion