Using Pip in a Conda Environment


Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back-to-back multiple times, establishing a state that can be hard to reproduce. Most of these issues stem from that fact that conda, like other package managers, has limited abilities to control packages it did not install. Running conda after pip has the potential to overwrite and potentially break packages installed via pip. Similarly, pip may upgrade or remove a package which a conda-installed package requires. In some cases these breakages are cosmetic, where a few files are present that should have been removed, but in other cases the environment may evolve into an unusable state.

There are a few steps which can be used to avoid broken environments when using conda and pip together. One surefire method is to only use conda packages. If software is needed which is not available as a conda package, conda build can be used to create packages for said software. For projects available on PyPI, the conda skeleton command (which is part of conda-build) frequently produces a recipe which can be used create a conda package with little or no modifications.

Creating conda packages for all additional software needed is a reliably safe method for putting together a data science environment but can be a burden if the environment involves a large number of packages which are only available on PyPI. In these cases, using pip only after all other requirements have been installed via conda is the safest practice. Additionally, pip should be run with the “–upgrade-strategy only-if-needed” argument to prevent packages installed via conda from being upgraded unnecessarily. This is the default when running pip but it should not be changed.

If there is an expectation to install software using pip along-side conda packages it is a good practice to do this installation into a purpose-built conda environment to protect other environments from any modifications that pip might make. Conda environments are isolated from each other and allow different versions of packages to be installed. In conda environments, hard links are used when possible rather than copying files to save space. If a similar set of packages are installed, each new conda environment will require only a small amount of additional disk space. Many users rely on simply the “root” conda environment that is created by installing either Anaconda or Miniconda. If this environment becomes cluttered with a mix of pip and conda installs, it is much harder to recover. On the other hand, creating separate conda environments allows you to delete and recreate environments readily, without risking your core conda functionality.

Once pip is used to install software into a conda environment, conda will be unaware of these changes and may make modifications that would break the environment. Rather than running conda, pip and then conda again, a more reliable method is to create a new environment with the combined conda requirements and then run pip. This new environment can be tested before removing the old one. Again, it is primarily the “statefulness” of pip that causes problems – the more state that exists because of the order of installation of packages, the harder it will be to keep things working.

For environments that will be recreated often, it is a good practice to store the conda and pip package requirements in text files. Package requirements can be provided to conda via the –file argument and pip via the -r or –requirement. A single file containing both conda and pip requirements can be exported or provided to the conda env command to control an environment. Both of these methods have the benefit that the files describing the environment can be checked into a version control system and shared with others.

In summary, when combining conda and pip, it is best to use an isolated conda environment. Only after conda has been used to install as many packages as possible should pip be used to install any remaining software. If modifications are needed to the environment, it is best to create a new environment rather than running conda after pip. When appropriate conda and pip requirements should be stored in text files.

We at Anaconda are keenly aware of the difficulties in combining pip and conda. We want the process of setting up data science environments to be as easy as possible. That is why we have been adding new features to the next version of conda to simplify this process. While still in beta, conda 4.6.0 allows conda to consider pip installed packages and either replace these packages as needed or fulfill dependencies with the existing package. We are still testing these new features but expect the interactions between conda and pip to be greatly improved in the near future.

Best Practices Checklist

Use pip only after conda

  • install as many requirements as possible with conda, then use pip
  • pip should be run with –upgrade-strategy only-if-needed (the default)
  • Do use pip with the –user argument, avoid all “users” installs

Use conda environments for isolation

  • create a conda environment to isolate any changes pip makes
  • environments take up little space thanks to hard links
  • care should be taken to avoid running pip in the “root” environment

Recreate the environment if changes are needed

  • once pip has been used conda will be unaware of the changes
  • to install additional conda packages it is best to recreate the environment

Store conda and pip requirements in text files

  • package requirements can be passed to conda via the –file argument
  • pip accepts a list of Python packages with -r or –requirements
  • conda env will export or create environments based on a file with conda and pip requirements

You May Also Like

Data Science Blog
InfoWorld: 5 essential Python tools for data science—now improved
If you want to master, or even just use, data analysis, Python is the place to do it. Python is easy to learn, it has vast and deep support, and most every data science librar...
Read More
Data Science Blog
Secure and Scalable Data Science Deployments with Anaconda
In our previous blog post about Productionizing and Deploying Data Science Projects, we discussed best practices and recommended tools that can be used in the production and d...
Read More
Data Science Blog
Python Data Visualization 2018: Where Do We Go From Here?
This post is the third in a three-part series on the current state of Python data visualization and the trends that emerged from SciPy 2018.   By James A. Bednar As we sa...
Read More