Securing the Open-Source Pipeline with Anaconda CVE Curation

Build and protect a secure repository fueled by accurate and curated CVE risk and vulnerability scores. Fully leverage open-source software for enterprise use with Anaconda’s CVE curation services.


The World of Open-Source Packages

Open-source software (OSS) leverages the power of its community to fuel innovation. Anyone around the world can use, study, change, or distribute source code for any reason, and hundreds of thousands of packages are uploaded into the world of open-source software. It is the Wild West of source code. OSS, while the backbone of innovation, is often error-prone, security risk-ridden, and unstable. Utilizing OSS in the enterprise requires vigilance, time, and expertise to ensure fidelity and stability in your environment. To make matters more complicated, OSS often relies on other OSS, too. These are called package dependencies.

Visualization of package dependencies in open source.

Hundreds of thousands of OSS packages rely on hundreds of thousands of other OSS packages, resulting in a highly complex dependency map. A complicated dependency map means a complex package supply chain, and a complex package supply chain represents a significantly increased level of vulnerability.

Any miscalculation in trusting packages in the dependency tree can lead to vulnerabilities spreading across your entire network.

“$350,000,000 in cryptocurrency was paid to hackers in 2020, a 311% increase from the prior year.” – Chainanalysis

The Anaconda Way

Anaconda’s answer to the OSS world is Anaconda Distribution, which includes conda: a cross-platform and environment package management system maintained by Anaconda. Anaconda Distribution includes hundreds of the most popular Python and R data science and machine learning packages that have been rigorously tested for compatibility, allowing faster and easier access for data science practitioners.

However, managing dependencies does not solve for risk. Due to the sheer number of Python and R packages and their dependencies, packages are still not free of security vulnerabilities and exposures. Imagine the effort that goes into Wikipedia’s fact-checking!

How do we deal with risk and vulnerabilities in OSS, then?

Users can access publicly available databases that flag packages for common vulnerabilities and exposures (CVEs), such as The U.S. National Institute of Standards and Technology (NIST) and U.S. National Vulnerability Database (NVD), to inform them of the vulnerability status of open-source packages. In addition, organizations can opt to use a CVE scanner to reveal CVEs existing in their environments.

Scanning for NIST and NVD-generated CVEs can help combat the inherent risk and vulnerabilities in OSS using a package management system where organizations can control which packages are being used in environments. However, NIST and NVD’s CVE systems are sensitive to flagging and reporting, regardless of accuracy, origin, or scope. This results in an overinflation of false positives in the CVEs.

For example, suppose a vulnerability was flagged in the package Django 2.1. In that case, NVD will report all versions of Django following the vulnerability as flagged for vulnerabilities, even if Django 2.2 fixed the problem.

Suppose organizations rely solely on NVD or NIST-generated CVEs. In that case, many packages will not pass enterprise-grade security policies as there would be assumptions about there being no package fix and no update option. Anaconda’s curation process takes a different approach to ensure users can access necessary open-source packages and dependencies while keeping enterprise security standards at the forefront.

Anaconda’s Human CVE Curation

Anaconda’s answer to an inflated database of CVEs is to manually curate NIST and NVD-generated CVEs. Anaconda’s curation team reviews flagged packages, verifies what software the CVE affects, and curates a CVE status and score.

Referring to the earlier example, Anaconda’s CVE curation team would update the Django CVE to clarify that it applies to only Django >= 2.1 <2.2, informing users that the newest version is patched, free of CVEs, and safe to use.

All CVEs receive a CVE security score (CVSS) from NIST and NVD, ranging from 1-10, 10 being the most vulnerable. Anaconda’s curation allows organizations to trust CVE scores and easily filter CVEs based on status and CVSS, allowing only packages that pass internal security policies into workflows.

What does Anaconda’s CVE curation look like?

Curated CVEs are all either:

  • Reported (all CVEs that come from NIST/NVD),

  • Active (vulnerabilities are still potentially active),

  • Cleared (vulnerabilities have been analyzed and determined not to be applicable),

  • Mitigated (vulnerabilities were proactively mitigated with a code patch), or

  • Disputed (vulnerabilities legitimacy was disputed by an upstream project maintainer or other community members).

Altogether, Anaconda’s CVE curation provides actionable and meaningful CVE reporting so OSS can be fully leveraged at the enterprise level and data scientists can focus on building models.

A Secure Repository, Once and For All

Now that you have a repository of Anaconda-curated OSS, you’ve locked risks and vulnerabilities out of your source code. But how do we maintain this secured environment?

In comes repository mirroring – mirroring creates a copy of a repository that allows users access to packages from a centralized, on-premise or in the cloud location. A mirror can be complete, partial, or only include specific packages. By creating a copy in your own server, you give OSS access for users behind a firewall, severing the online relationship between your OSS and the wider OSS network.

Mirrored repositories can have a “middle-man” or be completely offline. Repositories with a middle-man (known as a proxy) have inbound and outbound connections only through a designated port, which drastically minimizes your attack surface. Completely offline repositories are called “air-gapped” repositories, where there is no inbound or outbound internet connection. This is the most secure type of repository as there is virtually no attack surface to speak of.

Mirroring your repository creates a stable environment for your organization, too. A mirrored repository is a “point in time” copy, meaning all users accessing the repository will be accessing the same version of packages, ensuring consistency and compatibility across your organization.

Secure and stabilize packages behind your firewall.

Take advantage of Anaconda to secure your open-source pipeline so your team can spend more time building models, analyzing data, and making data-driven decisions.

Talk to an Expert

Talk to one of our financial services and banking industry experts to find solutions for your AI journey.

Talk to an Expert