What’s in a Name? Clarifying the Anaconda Metapackage

 

Box light tunnel

The name “Anaconda” is overloaded in many ways. There’s our company, Anaconda, Inc., the Anaconda Distribution, the anaconda metapackage, Anaconda Enterprise, and several other, sometimes completely unrelated projects (like Red Hat’s Anaconda). Here we hope to clarify two of those – the difference between the Anaconda Distribution and the anaconda metapackage.

The Anaconda Distribution is the installer that many people download to get a good start on a Python data science coding environment. It includes Python, pandas, scikit-learn, multiple data visualization options, and many other helpful libraries. This installer may come in the form of a GUI .pkg installer (for MacOS), a command-line .sh installer (for MacOS and Linux) and a GUI .exe installer (for Windows).  When you see “Anaconda Distribution,” we’re referring to these installers.

Those installers have a set of packages associated with them. Each installer release has a version number, which corresponds to a particular collection of packages at specific versions.  That collection of packages at specific versions is encapsulated in the anaconda metapackage. We call it a metapackage because it doesn’t actually contain any files – just dependencies on other packages. When we build the Anaconda Distribution installers, we first make the anaconda metapackage, and then we use that metapackage to pull all of the other packages into the installer. When you install Anaconda Distribution, the anaconda metapackage is installed as part of it. Moreover, we have a dedicated QA team that runs integration tests with all of the packages in the installer, so the installers and the metapackages represent known-good collections of packages.

Why is it important to know about the anaconda metapackage? It is useful for creating environments that have all of the Anaconda Distribution packages in them, and it has strong effects on conda’s solver behavior. When you have “anaconda-2019.03-py37_0” installed, conda is constraining all of the dependencies in that metapackage to exactly the version specified in that metapackage. This is good, in that that set of packages has been tested and is known to work. However, that set of highly constrained packages is also limiting, in that large numbers of constraints impose a harder satisfiability problem for conda to solve.

People sometimes wonder why conda refuses to update a particular package. Conflicts with the software in the environment is the main reason why. Conda’s behavior for a long time now has been to return a message like “All requested specs are already satisfied” – which is frustrating when you know there’s a newer version available. We’ve improved that behavior in conda 4.7.10, so that conda will do a better job telling you which spec in the environment is preventing installation of your requested update. We hope it helps users understand the situation, but the problem of conflicts with the anaconda metapackage is inevitable and complicated. There are so many packages in the anaconda metapackage that adding much of anything ends up interacting with one or more of the core anaconda dependencies. If you include the anaconda metapackage in your environment, then all of those packages included in the anaconda metapackage are pinned and you’ll likely get the “All requested specs are already satisfied” message until the next Anaconda Distribution release.

This is where a special version of the anaconda metapackage comes in. The version is “custom” which conda considers as lower than any “real” version number, such as 2019.03. This special version is what conda uses to “unlock” the packages so that you can update them after installing Anaconda Distribution. Until recently, this metapackage constrained only python, and even then, constrained python to only a particular minor revision.

Conda 4.7 changed the way that it built up the problem to solve, and this led to catastrophic uninstallation of the other packages that actual release versions of the anaconda metapackage had.  Why? Because Conda was asked to change something that was constrained by the anaconda-2019.03 metapackage. In order to remove that constraint, conda chose the anaconda-custom metapackage instead. Conda then “helpfully” removed the packages that were not directly named by any currently installed package – remember, anaconda-custom doesn’t have any of the named packages. This was a terribly unfortunate lack of anticipation on our part, and we’re sorry for the time we may have cost you. We fixed this by fixing our anaconda-custom metapackage so that it listed all the names, but not the versions and build strings, of all the packages from a given release. With that fix in place, conda will no longer uninstall the packages that are only part of the anaconda metapackage. Now it is merely slow, since switching from the pinned constraints to all unpinned constraints means a much larger, harder-to-solve problem.

But wait, there’s more! The anaconda metapackage contains several core low-level libraries, including compression, encryption, linear algebra, and also some GUI libraries. The GUI libraries are one worth mentioning in this context, because they are the source of some friction between software collections. In particular, Qt is a GUI package that is shared among several GUI programs in Anaconda Distribution and also in packages that you can add on. The version of Qt that is included with your particular Anaconda Distribution may not be compatible with what other packages need. RStudio is one particular known point of friction. When mixing large software ecosystems such as Anaconda Distribution (what comes with the installer) with the RStudio packages that are available to conda install, consider creating separate environments for Anaconda Distribution (or metapackage) and RStudio. Our documentation regarding environment creation is in the conda docs. This allows them to each have their own Qt version, and will save you some time trying to fight the solver.

So how should you stay up to date? Some people install Anaconda Distribution, and keep updating all the packages in that one base environment.  As we discussed above, that can cause headaches with conflicts or long waits at the “Solving environment…” spinner. Once you have used the Anaconda Distribution or the anaconda metapackage to figure out what packages you really find useful, we recommend creating separate, focused environments that align with your workflows. The fewer specs you have, the less likely it is you’ll encounter any conflicts between dependencies, and conda will also be much faster. Think of Anaconda Distribution as a great starter toolbox and a reference set of tested software, but power users typically grow out of it, preferring to install Miniconda. This gives them only what they need to run conda and create environments from there.  

People think about platforms as the foundation of a given computing environment. Microsoft Office, Matlab, or even Linux are platforms. A lot of people seem to think that Anaconda Distribution is similar. We prefer to see conda, the package and environment manager, as well as the conda package ecosystem as the platform. While Anaconda Distribution is built on that platform, it does not define it. We hope this helps you understand the Anaconda Distribution installers and the anaconda metapackage in their context.


You May Also Like

for Practitioners
What’s in a Name? Clarifying the Anaconda Metapackage
The name “Anaconda” is overloaded in many ways. There’s our company, Anaconda, Inc., the Anaconda Distribution, the anaconda metapackage, Anaconda Enterprise, and severa...
Read More
Data Science Blog
Galvanize Capstone Series: Geolocation of Twitter Users
In June of this year, I completed the Data Science Immersive program at Galvanize in Austin, TX. The final few weeks of the program were dedicated to individual capstone pro...
Read More
Data Science Blog
Anaconda Distribution 5.2 Released
We’re excited to announce the release of Anaconda Distribution 5.2! With over 6 million users, Anaconda Distribution is the world’s most popular and easiest way to do Pyth...
Read More