Enterprise Data Science
6 Reasons Your Open-Source Data Science Pipeline Needs Attention Now
Feb 11, 2020By Anaconda Team
Your enterprise data scientists are almost certainly using Anaconda Distribution alongside 20 million other practitioners worldwide. The Anaconda Distribution is a package and environment manager designed for solo data scientists, and it is the most efficient and convenient way to manage thousands of open-source data science packages. Data scientists use it to download open-source Python, R, and Conda packages to analyze, explore, and visualize data and to create machine learning models.
So the question is, who is managing and governing this open-source pipeline in your organization? Our guess is probably no one - and that’s not good. Here’s why it’s important, and why you need Anaconda Team Edition to do it.
1. Your open-source data science/ML pipeline is not secure.
The Anaconda Distribution was not designed for corporate IT environments. It does not provide user access control, reporting on vulnerabilities, the ability to blacklist or whitelist packages, or visibility into what packages data scientists are using in which models. Team Edition does all these things by providing a mirror of our vast package repository onto your corporate infrastructure. With Team Edition, open-source data science and machine learning becomes governable. Administrators can filter packages based on license type and vulnerability scores within a single, consolidated view for package management.
2. Your repository is probably relying on guesswork to match Conda package vulnerabilities.
Other packages scanning services have to guess about the content of Conda packages installed on a system. Not so with Anaconda. By knowing exactly what is built into a package, we provide more accurate Common Vulnerabilities and Exposures (CVE) reporting. Beyond reporting on the vulnerabilities that have been disclosed through the National Vulnerability Database (NIST), we also proactively mitigate vulnerabilities on existing software by applying available patches and releasing new builds of the same software versions.
Other repositories fail to manage Conda package dependencies properly. Their packages are not updated in a timely manner, if at all. Team Edition keeps track of and displays package dependencies making it possible to identify vulnerabilities across packages.
3. You aren’t getting Conda patch updates.
When we patch a Conda package, CVE scores in other repositories will not reflect the update until the next release. Knowing the exact contents of packages means that we are able to report positive confirmation when a security vulnerability has been mitigated by an updated build.
4. You’re interrupting the data science workflow.
In some highly regulated industries, such as finance, data scientists can wait up to a month to get approval to download the packages they need for a project. With Team Edition, data scientists search for the packages they need within their mirrored repository. Because packages that are there will already meet enterprise information security standards, data scientists can immediately continue with their projects without having to wait for approval.
5. Your environment isn’t transparent.
Knowing what packages your team is using to build machine learning models is essential for enforcing ethical and compliant AI and increasing transparency. With Team Edition, you can keep track of which packages each of your team members are using to build models.
6. Your data scientists are wasting time using the wrong tools.
Anaconda Team Edition is the first and only enterprise data science repository built by data scientists for data scientists. Data scientists don’t use repositories the same way most developers do, and repositories that are popular among developers are not as intuitive for them. Learning a vast new repository with the kinds of packages they’ll never use is not the best use of data scientists’ time. They can spend more time on high-impact work when they use our Conda-native platform designed for the management of data science and machine learning packages within channels.
Take Control of Your Open-Source Pipeline Now
Govern, secure, and manage open-source in your environment with Anaconda Team Edition. Reach out to us today to be part of our Strategic Adopter program and get reduced pricing for the first year.