Enterprise Data Science
How Heavily-Regulated Industries Can Accelerate Open-Source Innovation
Nov 22, 2021By Saundra Monroe
The open-source ecosystem is the engine that drives digital innovation; no single technology vendor can match or exceed the open-source ecosystem's pace. Open-source innovation in data science, machine learning, and artificial intelligence has revolutionized many of today’s leading-edge fields—emerging models and applications using predictive analytics, natural language processing, robotics, and other cutting-edge tools are rapidly changing the landscape.
While these powerful open-source tools have become essential for differentiation and competitiveness, the adage “with great power comes great responsibility” applies. The open internet is the reason the open-source ecosystem thrive; it is also the reason open-source software has the potential to introduce countless points of failure in an organization’s infrastructure. For heavily-regulated industries such as government, healthcare, and finance, open-source innovation may seem out of reach due to security and compliance restrictions that forbid exposing infrastructure to the internet.
Heavily-regulated industries can mitigate this risk by implementing “air-gapped” environments, where there is no inbound or outbound internet connection. Without an internet connection, air-gapped environments are physically separated from other computers and networks, eliminating an attack surface that can be open to vulnerabilities.
How organizations can safely use open-source software in air-gapped environments
If an organization has no outbound internet connection, how can they leverage the benefits of open-source packages and libraries found on the internet? Anaconda recommends two ways to leverage an air-gapped environment: No internet access, or one-way access.
Air-gapped environments with no internet access
When an organization requires their air-gapped environment to have no internet access:
- The administrator accesses and downloads packages from a secure location, as provided by Anaconda. This access will allow the administrator to pull down the full package repository, which includes all associated package Common Vulnerability and Exposures (CVEs) metadata.
- Once the administrator has pulled all the packages and CVEs down, they will be able to export them to a physical location, determined by the organization.
For one-way access, the air-gapped environment has a separation between the internet and the internal network:
- In this case, the organization’s internal instance can be connected to the internet via a proxy or through a secure, unidirectional HTTP network connection, also provided by Anaconda.
- The administrator is then able to pull down the full package repository, which includes all associated package CVE metadata.
Organizations do not have to sacrifice modern innovation for security. By implementing air-gapped solutions into your enterprise workflows, the organization can have complete control over what packages are ported and when they are ported into networks via a manual transfer from an external, physical medium (such as a hard drive or USB flash drive).
How do organizations decide which packages get ported?
Due to the additional security and compliance requirements faced by heavily-regulated industries, IT and Information Security teams in these organizations must have the means to evaluate the integrity and potential vulnerabilities of open-source software.
IT and Information Security teams can proactively create safe workflows for their data science teams by reviewing packages and their associated CVEs. CVEs use a common risk rating system, referred to as the Common Vulnerability Scoring System, and are supplied by government institutions such as the National Institute of Standards and Technology (NIST) and National Vulnerability Database (NVD). Organizations in the healthcare industry, for example, may have a security standard that stipulates no packages with a known vulnerability score above a 7 can be used within their networks.
CVE scores make it easy for IT and Information Security teams to make decisions about what packages can be used to address the needs of their data science teams. By combining an air-gapped structure of separation and the manual mediation of packages, organizations can create a safe and compliant open-source software pipeline and secure their networks.
IBM® Anaconda Repository for IBM Cloud Pak® for Data can be installed in air-gapped environments to provide organizations access to curated, open-source packages without connecting to the internet. Anaconda Repository allows enterprises to centralize their data science projects and confidently manage the security of their open-source packages and libraries used for AI.
Watch this on-demand webinar to learn how you can secure open-source data science in the enterprise.
Learn more about Anaconda Repository for IBM Cloud Pak for Data.