Introduction

This document collects configurations, best practices, and other advice from Anaconda’s Professional Services team, compiled across hundreds of customer projects. It is not yet well structured, organized, or proofread, but it still may be helpful!

What it covers is advice about configuring conda, specifying environments, managing channels, and generally using conda to meet your team’s needs in the context of an organization that has  various divisions and stakeholders with separate responsibilities (IT, governance, developers, devops, etc.).

This guide is meant to be read alongside these other documents to cover the full range of environment management topics:

  • conda documentation, which covers usage of conda at an individual level, without being prescriptive about how users or organizations apply the functionality available to achieve their goals. In contrast, this document focuses on how to employ conda in practice, within an organization that needs to implement security policies, manage package availability over time, and generally work at a level consisting of multiple simultaneous conda installations rather than the single conda installation that is the focus of conda’s own documentation.
  • 8 Levels of Reproducibility, a blog post that covers how to capture environments for cross-platform reproducibility and deployability, including how to create Docker images when appropriate.
  • PSM Channel Policies guide, a separate guide that contains Information about configuring Package Security Manager (PSM) conda channels.
  • AE5 Environment Management Documentation, a separate guide covering environment management topics specific to the Anaconda Data Science Workbench product (AE5).

Conda Environments

A conda environment is an isolated directory that contains a defined set of conda packages and their dependencies. Each environment ensures consistent and reproducible configurations, making it easy to manage multiple, independent software stacks on the same system without conflicts.

Conda environments can be broken down into categories across two axes:

  1. Axis one: Loose vs Explicit specifications:
    1. Loose: flexible, easy to maintain, but not fully reproducible to exact packages
    2. Explicit: Fully reproducible, but brittle and impossible to maintain
  2. Axis two: specification or a materialized environment on disk
    1. Specification/Abstract: yaml file listing packages to install (loose or explicit)
    2. Materialized: actual conda environment on disk, installers, conda-pack archives, cloudera parcels, docker image, etc. Materialized environments can be further broken down by whether they are inspectable or opaque.
  Loose Spec: User Intent Explicit Spec
Abstract (spec) Env Yaml file Env Yaml lock fileConda env exportConda list –explicit (extra brittle version)
Concrete / Materialized – inspectable /  on-disk On disk:  conda env export –from-history On disk:  conda list / conda env export
Concrete / Materialized – opaque   Conda-pack archiveInstaller created by constructorDocker imageMpack / Cloudera ParcelArchive loaded into HDFS (spark)

Conda Project

Conda-project overcomes the disadvantages of loose specs and explicit specs by tracking both of them. It is a tool that manages a per-project environment for flexibility and isolation.  For flexibility and future-proofing, conda-project tracks user intent by tracking the loose spec by tracking what packages the user asks for.  It also creates a fully pinned lock file for reproducibility.  During maintenance and development a user can upgrade the environment by editing the loose spec and then asking conda-project to regenerate the explicit fully-pinned spec.  

Specifying Environments

Of the items in the matrix above, these are four very generalized types of environments that capture most of the concepts above:

  • Loose specification
  • Manually pinned specification
  • Locked specification
  • Instantiated environment

In the context below, a pinned package refers to one with a specified version number, and optionally a specific build string. This contrasts with unpinned packages, for which conda places no restrictions on the version or build—it is free to select any available version that satisfies the environment’s constraints. When a package is pinned, however, conda is limited to the version (and build, if provided) specified by the user.

  • A loose specification: fully unpinned or mostly unpinned, such as an environment.yml file containing entries like pandas, matplotlib-base=3, etc. A loose environment specification tells conda which package names to install, but it doesn’t specify precisely which version. For instance, there are more than 40 different releases in Matplotlib’s 3.x series, ranging from 3.0.0 in 2018 to 3.10.0 in 2024. Moreover, for each release, there can be many builds, even for a single platform, and each such package variant has the potential to break a project or affect its results. A loose specification is a great way to indicate the package names needed in an environment, but it is not reproducible and will not work the same if shared across a team.
  • A manually pinned specification, such as an environment.yml file with handwritten entries like pandas=2.2.2, matplotlib-base=3.8.4, etc. for all the packages used directly. A fully pinned environment spec is reasonably reproducible, and with careful management the same pinned environment can be reproduced on multiple platforms, making it suitable for use across a team of Windows, Mac, and Linux users.

    Even so, such an environment is not strictly reproducible, because there can be multiple different builds of the same 2.2.2 pandas version (across different platforms, different Python versions, and with different underlying features of the package: CPU/GPU), and because such a specification cannot include pins for any package not shared across all platforms. Even so, platform differences are rare, and differences between builds rarely cause issues.  Some very advanced cases where build strings can matter is when one cares about deep dependencies like BLAS, MPI, OpenMP, or GPU libraries, each of which can potentially affect numerical results.

    Full manual pinning is awkward to do and is thus a relatively rare case; more often people work with specifications that are either loose (unpinned) or fully and automatically locked (below).
  • A locked specification, such as the lockfiles generated automatically by pixi, conda-lock, or anaconda-project result in a fully pinned specification down to the build string. Locked specifications specify a single installable package file corresponding to a single build, for each package.

    For instance, a lockfile could specify pandas=2.2.2=py311ha02d727_0, which precisely and unambiguously identifies one single build of the package, corresponding to a single specific installable file in a conda repository. Lockfiles are fully reproducible, but platform specific in that not every such package will be available on every platform (due to platform differences). A given locked environment will necessarily be platform specific, but most lockfile specifications (such as anaconda-project-lock.yml) will include multiple separate locked environments, one for each supported platform. 

    One advantage of having fully specified packages (name, version, build-string) is that conda’s solving step can be skipped (if supported for that tool), which makes it vastly faster to instantiate the environment, and also fully repeatable and deterministic.
  • An instantiated environment is a fully installed conda environment that is ready to be activated and used. It may result from tools like conda-pack, anaconda-project –pack-envs, or an installer created with Constructor. These environments can be deployed on a machine without requiring any packages to be fetched from a conda repository—they contain everything needed to run out of the box.

Before an environment can be used, it must be instantiated in this way. Once instantiated, it becomes tied to one specific platform (e.g., Linux, Windows, or macOS). In some cases, such environments can be shared among multiple users—for example, by installing them on a read-only networked filesystem. However, the more typical setup involves installing the environment on a single-user machine.

Instantiated environments consume significantly more disk space than environment specifications because they include the full set of binaries and runtime code for every package and dependency. As a result, they serve primarily as runtime environments rather than lightweight, abstract definitions. Nonetheless, they can be archived for long-term use or airgapped deployment—commonly via tools like conda-pack or platform-specific installers—because they no longer rely on external package repositories.

In contrast, environment specifications—whether loose, pinned, or locked—only define which packages to include. They require access to an external server to download the actual packages and cannot function without it.

Sharing environments

Once one or more individual users have developed reproducible environments, one should then consider how they want to manage environments over time and across people and groups. Sharing environments makes it easier for people to exchange code, easier for package usage to be governed (e.g. to enforce security requirements), and easier to get new users started (who no longer have to specify all of the packages they might need). However, as with any shared resource, conflicts can arise between competing needs, such as for team members looking for the latest package updates and team members wanting to avoid disruption to their current workflows, and so it’s important to be deliberate about how you enforce consistency and/or support differences.

An organization can manage shared environments at any or all of the previously listed levels. A typical procedure would be to start with a loose specification based on developer requirements (“I need pandas 2.2”) and then develop a separate fully pinned or locked specification based on a local installation. The pinned or locked specification can then be distributed to users for instantiation, or a fully instantiated version can be distributed as an installer or conda-pack file, or the environment can be installed on a shared disk volume for all users to access. The loose specification should also be preserved, because the looseness is what allows the environment to be updated later to take advantage of new versions; a fully locked file is essentially unchangeable because any one package one would want to update is generally unlikely to be able to be updated without corresponding changes in other files, requiring relaxing the locking constraints. Thus in most cases organizations and users should be aiming to preserve both a lockfile and a loose specification, the former for immediate instantiation and the latter to allow later upgrades.

In any case where an organization manages a specification rather than an actual instantiated environment, users can install the environment locally with a command like conda env create –file=URL. In most cases, the specification file will be stored in a version-controlled repository so that it can be updated and reviewed over time. Some Anaconda products like Anaconda PSM (on-cloud or on-premises) also offer features for capturing and sharing those environments, but the crucial parts of the process are the ones about how the organization selects and manages these environments over time, with dissemination only being the last step.

Managing shared environments

The variety of ways to manage environments forms a continuum that can be interpreted as a spectrum from individuality to collectivism:

  • Per-project environments: Allows arbitrary choice of packages needed for any specific project, and allows those packages to be preserved at whatever their last known working state was, ensuring complete reproducibility. Well supported by pixi and conda-project as well as the older anaconda-project package, or by using conda with an empty base environment and separate explicit environments activated per project.
  • Per-person environments: Allows arbitrary choice of packages per person, but requires the author to update code in all of their projects if needed for compatibility, whenever they update their environment over time. Typically results from running the Anaconda distribution installer and then conda installing specific packages into the base environment. Easy to share code between projects maintained by a single person, since all have a single shared environment by default, but difficult to share code between people, because updating the target’s environment to work with the source’s code is likely to break the target’s existing projects. Reproducibility is extremely difficult in this case, as it is nearly impossible to replicate the precise series of install steps over different dates that resulted in any particular individual’s conda environment.
  • Per use-case shared environments: Same as per-team shared environments below, with all the same pros and cons, but maintaining separate environments for different situations, typically corresponding to code maturity levels like dev, test, and production. The prod environment is often very small, with coding and development tools like editors omitted to reduce the attack surface and potential security vulnerabilities. The test environment is often the same as prod, but with later versions or some additional testing or validation packages. The dev environment can be much larger, but people using it need to be aware that code intended for production will only be able to use a subset of the available packages. 
  • Per-team shared environments: Centrally managed team environment that requires multiple people to accept a specific set and version of packages covering their joint needs, and they then develop their projects around the shared environment rather than separate ones. Makes it simple to share projects across the team without having to debug version compatibility issues.

    Makes it simple to onboard a new user, who is then told simply to use this pre-specified environment. It also makes it simple to provide an approval step for central governance to review for security vulnerabilities and verify open-source licenses.  However, developers with special requirements, such as wanting to use new features of a library, can often chafe at being required to use the shared environment, and unless the requirement is enforced, the benefits of sharing are limited.

    Updating a shared environment can also be problematic, since so many people and projects are depending on it. In practice, a per-team shared environment is often treated as a default, with exceptions approved on a per-project basis.
  • Per-organization shared environments: A large organization may wish to enforce a single environment across multiple divisions and teams, whether for security and governance purposes or to prioritize code sharing and collaboration. When an environment is shared at the team level, individual users are likely to be successful in lobbying for inclusion of a certain package or version, but such localized needs are likely to be set aside when the environment is shared very widely in this way. In practice, sharing across the entire organization is likely to be successful only for highly homogenous groups of users, and exceptions are likely to be needed in practice across the various Python users in any organization large enough to have multiple teams.

Summary Table

Type Advantages Disadvantages
Per-project environments Full flexibility, supported by tools, reproducible and maintainable; easy to innovate Not centrally managed, security admins need more experience
Per-person environments Flexibly, easy to get started Brittle, non-reproducible, non-recoverable, tight coupling between a user’s projects
Per use-case shared environments Small number of environments to manage Less flexible, env changes affect all projects of the same use-case, agility is reduced
Per-team shared environment Small number of environments to manage Less flexible, env changes affect all projects of the same use-case, agility is reduced
Per-organization shared environments Very small number of environments to centrally manage; Security and governance is trivial; easy to distribute across a cluster: spark nodes or otherwise. Very inflexible, users stuck using old packages, brittle: changes affect all projects, no agility for users to get access to different conflicting use-cases or bleeding edge technology; Updates and maintenance is impractical 

Each organization or division should be careful to decide whether sharing, collaboration, and governance are more important, or whether reproducibility and individual innovation are more important, and set policies (including policies for exceptions) accordingly.

Updating shared environments

Once a stance on how environments are shared has been chosen, the next major consideration is how and how often those environments will be updated. As soon as an environment has been specified, it will be out of date, with new package versions appearing hourly in the Python ecosystem. Eventually, any such environment will need to be updated, whether for security reasons (to get the latest security patches), to get bug fixes as they are released, to add new functionality, or simply to keep up to date with the versions currently supported by the package authors (as these are often updated on a fixed schedule).

One typical approach for organizations maintaining shared environments is for them to establish a scheduled update frequency, coupled with a naming convention that encodes the release schedule into the environment name. For instance, an organization may have a data_science_202403 environment, with any new project building on that shared environment and either forever using that environment into the future, or being expected to be updated to the new version (after some delay period) once the updated environment is available. The naming convention could change depending on the update frequency, but the suggested convention covers the most common case of monthly or quarterly updates.

Once an environment has been updated, an organization typically invites a few test users to verify that it is usable on their projects, and then releases it to all users.

If a major security issue is identified in between scheduled releases, organizations will need to have some way to update the shared environment. Typically this would be handled using the naming convention, with a new environment data_science_202403a created (with the minimal required changes) and all projects using the old version will need to update.

Selected Environment Commands

  • conda env export: Exports a fully pinned specification (package-version-buildstring)
  • conda list –explicit -sha256: Exports a fully pinned specification, with channel URL, and sha256 hash.  Very brittle, but no solver needed to recreate and no channel config needed.
  • conda env export –from-history: Loose spec that attempts to capture what the user intended to install.  No dependencies listed, just the core packages desired.
  • conda pack -n my_env: Pack environment my_env into my_env.tar.gz
  • conda pack -n example –format parcel –parcel-name=sklearn Creates a Cloudera Parcel named sklearn that can be pushed across a Cloudera Hadoop cluster.
  • constructor .: See documentation.  Builds an installer for Windows, Linux, or MacOS from a yaml file declaring packages.  
  • conda project init: Initialize a conda-project which will track dependencies and provide both a loose specification and a fully-pinned lockfile

Conda usage

This section focuses specifically on the conda CLI command, not other ways to install packages like mamba or pixi. Some of the advice applies to those other tools as well, but there will be differences in syntax and in the details of how the tools operate.

How to debug conda environment creation 

Conda’s documentation is currently sparse regarding how to resolve conflicts, solve failures, and other issues customers encounter when creating environments. The Professional Services team has extensive experience with these topics, and can assist customers with complex conda issues or even a full Environment Management Kickstart.

How best to use conda with environments in general

Conda ships with the ability to manage environments and conda may be used productively within two main types of environments: named, global environments where conda manages the prefix and local environments where the prefix is chosen explicitly.

Named environments are created with the -n flag (e.g. conda create -n named) while local environments are created with the -p flag (e.g. conda create -p ./myenv). To activate named environments, only the name is needed (e.g. conda activate named) while local environments are activated by specifying the path on the filesystem (e.g. conda activate ~/user/myenv).

Named environments are useful for different types of work, across projects. For instance, someone may have an environment for data analysis tasks (e.g. containing pandas, duckdb) and another environment for web development (e.g. containing flask and django).

Local environments that are contained within specified directories, in contrast, are useful for projects. For instance, a project called stock_analysis and another project called timetracking_analysis could both contain an environment in the ./analysis directory. As local environments are used, the name clash that would occur using the named environment analysis is avoided.

While data scientists may create local conda environments explicitly, making sure to activate the correct environment before running analysis code (e.g. using conda run) , there are project management tools such as conda-project or the previous version anaconda-project, which help coordinate such tasks. In particular, these tools can allow complex CLI invocations to be tracked and triggered easily, they can help ensure the environment maintains a lock file over a project’s lifecycle, they can contain metadata (e.g. a project description), they can track the code and notebooks involved in an analysis and they can help track (or even download) important data assets needed for a project to run.

Controlling Channel Priority

Channel priority determines which specific version of a package is used when fetching the dependencies required by a package.

When using conda’s default setting of channel_priority: flexible, conda will not look in lower priority channels for a package, if any version of that package is found in the higher priority channel.  For example, if a newer version of pandas is in a lower priority channel, any versions of pandas in the higher priority channels, even very old versions of pandas will block conda from considering the low priority channel containing the more recent version.

Conda will only look in the lower priority channel when a package is not found in the higher priority channel.  Conda won’t see the package even if the specific version in a lower priority channel is required for a dependency.

To install a single package from a different channel one can also specify the channel directly in the conda command by explicitly specifying the channel:

conda install dev-packages::<internal-package>

Note: The -c, –-channel switch with conda should be avoided, because that makes the specified channel the highest priority channel only for the current conda command, which means all dependencies — not just the requested package–will get installed from that channel.  Future conda commands will revert those packages because channel priority will revert to the .condarc.

Reference: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html

Definition of channel_priority:

channel_priority (ChannelPriority):

Accepts values of 'strict', 'flexible', and 'disabled'. The default value is 'flexible'. With strict channel priority, packages in lower priority channels are not considered if a package with the same name appears in a higher priority channel. With flexible channel priority, the solver may reach into lower priority channels to fulfill dependencies, rather than raising an unsatisfiable error. With channel priority disabled, package version takes precedence, and the configured priority of channels is used only to break ties. In previous versions of conda, this parameter was configured as either True or False. True is now an alias to 'flexible'.



# default
channel_priority: flexible

Fake_Defaults

Fake defaults is a helpful tool that can be used to help you design working environments when filtering packages out of the Anaconda’s default repository. When CVE policy filtering is applied to defaults, required dependencies are often removed.  Since the packages are completely removed during filtering, conda does not know about them and cannot offer any advice to end-users when it cannot install desired packages because of missing dependencies. 

Fake_Defaults can be used for troubleshooting in order to provide conda with the repodata for the full repository in a way that doesn’t provide access to vulnerable conda packages.  Using this package, one configures the .condarc to use the fake_defaults channel as the lowest priority channel.  Any package that conda attempts to pull from the fake_defaults channel is a required dependency that has been filtering out.

Here is an example error message that can easily be solved by fake_defaults:

conda create --dry-run -p ./test1 python=3.8 pandas=1.4.4 

...

UnsatisfiableError: The following specifications were found to be incompatible with each other:

...

Package python conflicts for:

pandas=1.4.4 -> bottleneck[version='>=1.3.1'] -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0']

python=3.8

pandas=1.4.4 -> python[version='>=3.10,<3.11.0a0|>=3.9,<3.10.0a0|>=3.8,<3.9.0a0']

The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.17=0

  - feature:|@/linux-64::__glibc==2.17=0

  - pandas=1.4.4 -> libgcc-ng[version='>=11.2.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.17

Fake_defaults is a conda package that is built nightly and has a full snapshot of the repodata for Anaconda’s default channel. It can be installed directly from anaconda.org using the steps below or manually downloaded if network security denies direct access.

  1. conda install services::fake_defaults
  2. vi /opt/anaconda/.condarc
      1. Update the channels dict to look like this:
        channels:

          - defaults

          - file:///opt/anaconda/fake_defaults
    1. If you are on a windows machine, you can install fake_defaults as above. Then find where the package is installed (most probably in \ProgramData\anacondapro) and add to .condarc. Use:
      notepad \Users\Administrator\.condarc
      channels:

      - defaults

      - file:///C:/ProgramData/anaconda3/fake_defaults
    2. Test installing the package in a new conda environment like this:
      conda create -p test1 --dry-run <package spec>
    3. Look for any packages coming from the fake_defaults channel.  These are the packages that conda needs to install the desired package spec.

    Managing channels

    Conda channels allow organizations to treat large collections of conda packages as a group, so that users can select appropriate packages at an appropriate level of detail, depending on their potential vulnerability level, frequency of updating, and variety of packages available.

    See the separate PSM Channel Policies document for guidance on how to configure and use channels to implement organization and team goals.

    Working with Conda-Forge

    Conda-forge is a community project that provides a very large number of community-maintained conda packages. While it can be a good source of specialized packages, there are a few important points to keep in mind:

    1. Conda-Forge is a community project with a loose confederacy of contributors that is open to new contributors which results in a few downsides:
      1. While there are best efforts to catch package incompatibility, there is no guarantee that any two packages are co-installable. 
      2. Contributions are reviewed by other volunteers, so there is no guarantee that a malicious attacker hasn’t slipped hidden malicious code into a package during building.
    2. Mirroring all of conda-forge will result in a channel with terabytes of packages.  Many of these packages will shadow Anaconda packages and some packages will cause conflicts with Anaconda packages.
    3. CVE filtering and information is only partially available for conda-forge packages, with much lower levels of curation.
    4. Conda-Forge packages are not built on Anaconda’s secure infrastructure.   
    5. Anaconda Support cannot help with issues caused by conda-forge conflicts as this is a task for Tier 2 Professional Services.

    Advice for using conda-forge packages:

    1. If possible, use packages from Anaconda’s “community” channel, which are conda-forge packages that are expected to be compatible with Anaconda default channels
    2. Otherwise, keep conda-forge packages on a dedicated separate channel.
    3. Only mirror the minimal number of conda-forge packages needed to minimize risk and constrain the possibility of conflicts.
    4. Consider filtering by python version when mirroring from conda-forge to reduce the number of packages.
    5. In conda’s settings — the .condarc — list your conda-forge channel last to make it the lowest priority.  Conda will only use this channel when needed to fulfill dependencies, because conda’s default setting is for strict channel priority.
    6. When installing something from the conda-forge channel, use the form:
      conda install conda-forge::<conda package>
      instead of using
      -c conda-forge
      or
      --channel conda-forge
      The latter version will invert channel priority by making conda-forge the highest priority channel for this one operation, which will typically completely reinstall most packages in your environment rather than simply updating the specific packages you requested.
    7. Consider contracting Anaconda Professional Services to make custom builds of essential conda packages so that they will be compatible with defaults and independent of conda-forge.  

    PyPI: Python Packages

    While PSM (on prem) can mirror packages from PyPI, it is preferable to rebuild them as conda packages to minimize risk and increase compatibility with the Anaconda ecosystem: conda, conda-lock, conda-build, anaconda project, AE5, PSM (on prem), and other projects.

    Secondly, while python wheels are pre-compiled for specific platforms, other packages on pypi will sometimes require compilation at install-time, which can lead to compatibility problems or security issues. Pre-compiled wheels may or may not be compatible with conda packages, because they may include binary code compiled with different options than those used for Anaconda’s conda packages, or with binary dependencies with different versions specific to that package but different to the ones used by Anaconda’s conda packages, leading to run-time incompatibilities. 

    Pure-python pip packages typically have good compatibility with conda, but it is important to ensure that any binary dependencies of these packages are installed using conda wherever possible, so that the overall set of interdependent packages is as compatible as possible. At the administrator level, a way to achieve this goal is to ensure that only the required pip packages are mirrored, so that dependencies will always be satisfied using conda rather than pip. 

    Part of the complexity of using PyPI packages is at the user level, users would ordinarily need to bring in specific pip packages using pip install –no-deps after first ensuring that all required dependencies are installed using conda. Calling pip in this way is a manual step that is not needed when using all conda packages.

    Security

    Anaconda Package Security / How our supply chain is secured

    Our precise internal processes change over time to address evolving threats and are never made fully public, but here are the important bullet points around how packages are created and uploaded:

    1. Source code is downloaded from various upstream sources.  In rare cases, python wheels are downloaded and used as the basis of the conda package.  
    2. Most if not all upstream sources publish a cryptographic hash of the release to verify the integrity of the source code during the download process. 
    3. All of our packages are built and packaged using known-good compilers on a secure network and secure hosts. This process is deterministic and repeatable.  Standard IT access controls, VPN segmentation, host protection, and virus scanning protect these machines.
    4. These packages are then uploaded in a secure manner, from a secure network to prevent tampering after packaging. 
    5. Packages in the repositories–data at rest–are protected by standard IT access controls and auditing. 
    6. Although we do not review packages for security vulnerabilities, we rebuild packages when the upstream source applies security patches in a timely manner.  

    Note: Anaconda does not typically review upstream source code for security vulnerabilities. We simply guarantee that we build what the upstream maintainer has released.  See item 6 above.  


    1Not to be confused with Reproducible Builds. Anaconda’s builds do not attempt to build bitwise identical builds. There are no attempts to remove timestamps, local paths, and other temporal artifacts that result in slightly different binary outputs when building from the same source code.

    Anaconda Installers

    Installers are the starting point for getting conda installed onto a system and configured appropriately.

    Types of Installers

    There are three types of installers. 

    • Miniconda: minimal-sized installer with conda and python, ready to add your preferred packages
    • Anaconda Distribution: large installer with over 450 scientific packages installed all at once, out of the thousands available from Anaconda’s default channels
    • Custom: a custom-built installer with a customized list of conda packages that can be automatically configured for a low-touch installation.

    Note: Although Anaconda is a distinct type of installer, that term is also used to refer to any general installation of miniconda, anaconda, or a custom installer, because once conda is installed the only difference between those methods is what the starting point is. Regardless of the starting point, users can always install additional packages and update configuration as needed.

    Anaconda or Miniconda installer?

    How do you choose between the two stock Anaconda distributed installers–Miniconda and Anaconda?

    Choose Anaconda if you:

    • Are new to conda or Python.
    • Like the convenience of having Python and hundreds of scientific packages automatically installed at once.
    • Have the download time and disk space—a few minutes and 3 GB.
    • Do not want to individually install each of the packages you want to use.
    • Wish to use a set of packages curated and vetted for interoperability and usability.

    Choose Miniconda if you:

    • Do not mind installing each of the packages you want to use individually.
    • Do not have time or disk space to install over 1,500 packages at once.
    • Want fast access to Python and the conda commands and you wish to sort out the other programs later.

    “Stock” Miniconda and Anaconda installers can be obtained at https://www.anaconda.com/download . PSM (on prem) also provides features for distributing installers inside your organization (e.g. behind your firewall).

    • Turnkey Rollout of Anaconda
    • There are a few options when it comes to automating the installation of your chosen installer across an organization to many users.  These are ordered from worst to best experience by the end-user.
    • Stock installer + manual configuration
    • Stock installer + custom IT scripts for configuration
    • Custom installer with configuration baked into the installer
    • Custom installer with configuration and authentication tokens baked into the installer (conda-ident)

    Stock Installer + manual configuration

    This is not recommended for organizations because as a process it does not scale-out and will have a much higher operational and support expense helping users with bespoke installations.

    Stock installer + customer IT scripts

    If your organization has tooling and operational experience around remotely managing servers and user end-points, a valid option is using that system to distribute Miniconda or Anaconda across those machines and pushing a configuration script across the organization. This requires sophisticated tooling and experience across the platforms supported by your organization.

    Custom Installer with custom configuration

    Using a tool called conda/constructor, one can create a binary installer from a specified environment of conda packages and using custom conda configuration (conda.rc settings). This tool needs to be run for each platform that is required, but every setting can be configured precisely as needed. Constructor is an open-source tool available for any customer to use, but configuring it can be difficult, and we recommend working with Anaconda Professional Services to get set up initially.


    construct.yaml:

    name: Anaconda-py37
    
    version: 1.0.0
    
    company: YourCompany
    
    channels:
      - defaults
    
    specs:
      - conda >4.9.1
      - conda-repo-cli
      - python=3.7
      - ipython
      - anaconda-navigator
      - console_shortcut # [win]
      - menuinst=1.4.18
      - powershell_shortcut # [win]
    
    ignore_duplicate_files: true
    
    post_install: post-install.bat # [win]
    
    # license
    license_file: eula.txt
    
    # Don't clear out the pkg-cache
    keep_pkgs: True
    
    channels_remap:
      -
        src:  https://repo.anaconda.com/pkgs/main
        dest: https://anacondaserver.corp/api/repo/main
    
    extra_files:
      -  caroot.pem   # Root CA certificate
    
    condarc:
        restore_free_channel: false
        channels:
          - defaults
        default_channels:
        - main
        - internal
        channel_alias: https://anacondaserver.corp/api/repo
        # added in extra_files above
        ssl_verify: caroot.pem
        # Don't ask to send error reports directly to Anaconda
        report_errors: false

    post-install.bat

    :: configure PSM (on prem)  login
    
    call conda repo config --set oauth2 true
    
    call conda repo config --set sites.prod.url  https://anacondaserver.corp/api
    
    call conda repo config --set default_site prod
    
    :: Set conda repo to use SSL Cert
    
    call conda repo config --set ssl_verify %PREFIX%\caroot.pem

     

    Package Building

    Automated Building using CI/CD

    Any automated build system can be used to build conda packages with conda-build.  To upload built packages to an on-premise conda repository, follow the steps below.

    Conda Build command

    When automating builds through CI/CD, you should stop conda-build from automatically uploading the package and upload in a separate step described in the next section. The command for conda-build will look something like this:

    conda build $RECIPE_PATH --no-anaconda-upload

    Uploading to PSM (on prem)

    Interactive Method

    1. conda repo login
    2. conda repo upload -c $channel_name $package
    3. conda repo logout

    Non-Interactive Method

    1. Create a token using the documentation for PSM on-prem here.  https://www.anaconda.com/docs/psm/on-prem/latest/user/auth_token
    2. From the CLI, upload the package using this command: conda repo -t $token upload -c channel $filename
      1. $token: The token you created in step #1
      2. $filename: path and filename to your conda package

    Difference between build and host

    See the official documentation here: https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#requirements-section

    • Build: tools needed to build the package, compilers like when using the {{ compiler() }}, variable.
    • Host: environment in which the package is built and linked

    This distinction is slightly confusing, but as a rule of thumb, place compilers in the build section and build-time dependencies in the host section.

    Cross-compilation from Linux for Windows

    It’s not currently possible to cross-compile for Windows from a Linux build machine.  Packages compiled on Windows must use Windows compilers.  Neither Anaconda nor conda-forge provide a self-contained build toolchain for Windows.  One would need to download compilers and platform SDK directly from Microsoft.

    Currently, the Anaconda Distribution team is using this compiler declaration for windows packages:  vs2017. In the conda_build_config.yaml the declaration would look like this (add win-32/or win-64 as needed):

    c_compiler:
      - vs2017
    
    cxx_compiler:
      - vs2017
    
    target_platform:
      - win-64

     

    Noarch packages

    Pure python packages that do not depend on the current platform can be built as noarch, which signifies to conda that it can run on any platform: win-64, osx-64, linux-64, etc.  The setting in the conda recipe is under the build heading and is noarch: python.   

    Building those packages without noarch would mean having to rebuild them for all platforms and all versions of python that you want. Even if a package is noarch, it can still be constrained to a given Python version by setting the requirements in the run section. 

    Force Upload Duplicate Builds

    Don’t use the force upload option to upload duplicate builds to an on-premises repository like PSM (on prem).

    Rebuilding and overwriting a specific build of a package is possible, but the upload will fail unless a special force flag is used.  Forcing is strongly not recommended as it causes problems with reproducible installs.  Changing the package out from under people will cause problems when conda has already cached the package, but a new build has the same version and build number. 

    Instead, increment the build number to distinguish re-builds of a given version of the package.  At that point, you can delete the old build if you are concerned about it being used, but there is then a risk that someone depends on that exact build number in their project.