Enterprise Data Science
The Benefits of Mirroring the Anaconda Repository
Jul 07, 2021By Stephen Nolan
You may have heard the term “mirroring” tossed around in discussions about open-source package repositories. What exactly is mirroring, and how do you know if it’s right for you?
Let’s begin with an explanation of the term itself, using the Anaconda repository as an example. A mirror is simply a local copy of the Anaconda repository that allows users to access the packages from a centralized, on-premise location. A mirror can be complete, partial, or include specific packages or types of packages.
While mirroring is not required for every organization, here are a few reasons why you may choose to mirror the Anaconda repository.
For security reasons, most servers do not have general outbound internet access unless it is a core functional requirement for the server. This means that if your team operates inside a company firewall, you won’t effectively utilize Anaconda’s package repository.
Mirroring resolves this issue by creating a local version of the repo available to users inside the firewall. Similarly, you can also create a mirror in an air-gapped environment to help improve performance and security.
Better Bandwidth Utilization
The Anaconda repository is huge! An archive containing all packages for Mac, Linux, and Windows platforms is more than 250GB. If several members of your team are individually accessing the public repo and downloading packages, you will quickly eat up bandwidth. By creating a local copy of the repo, you can significantly reduce bandwidth consumption and improve performance for your users.
Consistency Across Your Organization
The Anaconda repository is updated multiple times per day to ensure our users have access to the latest versions of the open-source packages we support. While frequent updates are certainly a benefit to our users, they can create inconsistency when members of the same team or organization are working off slightly different versions of packages.
When you create a mirror, you can ensure every member of your organization is accessing the same version of the repository. You could choose to have a “point in time” copy and can rest assured that everyone is working from the same version.
Shorter Build Times
On-premise mirrors can improve performance across your network. An on-premise mirror benefits from opportunities to build fast local networks between the repository and edge machines like desktop clients, CI/CD, and production systems.
For example, in CI/CD build or test automation scripts, there will need to be dozens of Conda packages downloaded on each run totaling several gigabytes. Configuring the CI/CD automation environment to connect to the Anaconda repository's on-premise mirror eliminates network bottlenecks between your site and the Anaconda repository, enabling faster environment creation and package builds.
Business Continuity and Fault Tolerance
In addition to performance gains, local mirrors enable you to build fault tolerance and tailored scaling to meet your needs. Even though the Anaconda repository is backed by a world-class CDN that enables a high degree of scalability and uptime, local mirrors shield you from outages on the public internet. They allow your business to operate without interruption. Further, you can deploy redundant local mirrors along the most critical paths in your network in a cost-effective manner.
How to Mirror the Anaconda Repository
If you’ve decided mirroring is the right choice for your organization, what’s next?
The ability to mirror the Anaconda repository is available with paid subscriptions to our commercial products (some exclusions apply). Mirroring is not permitted for users of our free software, Anaconda Distribution. If you are not an Anaconda customer but would like the ability to mirror our repository, we’d be happy to help! Contact us to learn more.