Open-source software (OSS) has emerged as a powerful force, revolutionizing the way organizations approach data science and machine learning development, collaboration, and innovation. With a wealth of benefits including transparency, cost-effectiveness, and a vast community of contributors, open-source software has garnered widespread adoption across industries. However, open-source security brings challenges and threats every day that can separate the victors from the vanquished in a new era of security threats.
Navigating this new game requires a strategic approach and a willingness to evolve. As the adoption of open-source grows, organizations must comprehend the importance of adaptable strategies that leverage the power of open-source technology, apply proprietary solutions with care, and safeguard against potential risks.
As you make decisions about the platforms and tools to include in your data science and machine learning technology stack, it’s important to consider how open-source security is managed by the authors and maintainers of the tools you choose to use.
We’ve created this guide to be a handy reference about open-source security and critical considerations to ensure a secure software supply chain, and best practices for open-source security with Python and R. If you find this guide helpful, please bookmark this page and share it with colleagues.
In this guide, we will cover open-source security for Python and R.
We will define open-source software, open-source software tools, and open-source security. We will consider the security advantages and challenges of using open-source software and explore approaches for security when you are working with open-source repositories, libraries, packages, and databases. Finally, we will share best practices for open-source security when you are building data science and machine learning solutions with Anaconda, Python, and R.
Table of Contents
Let’s start with the basics.
This combination of two is your best bet.
Anaconda’s security team compiled these recommendations.
Bring this list of questions to inform conversations with prospective providers.
Trusted source for open-source packages.
Questions with the highest search volume
Let’s start with the basics to inform those who are learning about open-source software and security considerations when building applications with Python, R, and open source.
What is open-source software?
For data science and machine learning, open-source software refers to software repositories, packages, libraries, and platforms that are freely available to the public, allowing users to access, use, modify, and distribute the source code.
Specifically, to be considered open source, software must meet these conditions:
- Free redistribution – Users are unrestricted from selling or giving away the software as a component of aggregated software from different sources. No royalties or fees are required for the sale of such software.
- Source code – The software includes source code and allows distribution both in source code and in compiled form, with a well-communicated option for obtaining the source code, preferably without charge.
- Derived works – The license for open-source software allows modifications and derived works can be distributed under the same terms as the original software’s license.
- Integrity of the author’s source code – The license allows the distribution of software built from modified source code.
- No discrimination – The license does not discriminate against people or groups, domains, or applications.
- Distribution of license – Program rights apply to people who receive the redistributed software; no additional license is required.
- License – The license for open-source software is not specific to a product, does not put restrictions on other software distributed with it, and is not tied to any individual technology or interface style.
For more detailed information, review the open-source definition by the Open Source Initiative.
What are examples of open-source software?
Open-source solutions have become the foundation of many data science and machine learning (ML) projects due to their accessibility, flexibility, and active community support. Below are some popular open-source software tools used in data science and ML. You can access them through Anaconda.
- 1. Python is a widely used programming language in the data science and ML community. Its simplicity and readability make it a favorite choice for various tasks. Python works with many open-source libraries for data manipulation (e.g., Pandas), numerical computing (e.g., NumPy), machine learning (e.g., scikit-learn), deep learning (e.g., TensorFlow, PyTorch, Keras), and visualization (e.g., Matplotlib, Seaborn).
- 2. R is another popular programming language for statistical computing and data analysis. It provides a vast array of open-source packages for data manipulation, visualization, and statistical modeling. Some popular packages include ggplot2, dplyr, and caret.
- 3. Jupyter Notebook is an interactive web-based tool that allows users to create and share documents that combine code, visualizations, and narrative text. It’s widely used for exploratory data analysis, data visualization, and educational purposes.
- 4. scikit-learn is a versatile Python library for machine learning. It includes various algorithms for classification, regression, clustering, and more.
- 5. TensorFlow and PyTorch are leading open-source frameworks for machine learning and deep learning. They allow users to build and train complex neural network models for various tasks, such as image recognition, natural language processing, and reinforcement learning.
- 6. Dask is an open-source parallel computing library in Python. It enables parallel processing of large datasets and distributed computing on clusters using a familiar Python API.
- 7. Pandas is a powerful open-source library in Python for data manipulation and analysis. It provides data structures like DataFrames and Series, making it easier to work with structured data.
These are just a few examples of the vast array of open-source software available for data science and machine learning. The open-source nature of these tools allows data scientists and machine learning practitioners to leverage the collective efforts of the community and contribute to the improvement of the tools themselves.
Who uses open-source software?
Anyone can use open-source software, and it is widely used by organizations, academics, and individuals. The use of OSS in organizations has risen exponentially in recent years and continues to rise. The majority (97%) of applications use open-source software.
We are seeing an increase in the use of open-source software in real-time, with Anaconda package downloads steadily increasing from 672 million in 2017 to 7.5 billion in 2022. Anaconda provides centralized access to thousands of open-source tools on a single platform, with paid access for enhanced security, governance, and compliance capabilities.
Caption: Anaconda’s package downloads data over a 6-year period.
As the use of open-source software increases, so do associated risks and vulnerabilities. Open-source security is on everyone’s mind these days due to breaches like the Solar Winds cyber attack and the Log4Shell vulnerability. We’ve seen in recent years how small vulnerabilities in software can be exploited for massive gain by criminal enterprises, mostly through ransomware. Today, open source powers nearly every piece of software or technology, so maintaining its security is of the utmost importance.
What are open-source tools?
Open-source tools consist of repositories, packages, and libraries—all of which are fundamental components of the open-source software ecosystem. They play a crucial role in facilitating collaboration, knowledge sharing, and the development of various applications and projects.
Open-source repositories are a collection of software repositories, packages, and libraries. They are curated by individuals or organizations and offered in a central location for users to access, often without cost.
In the context of programming, a package or library is a collection of reusable code modules, functions, or classes that solve specific problems. They offer pre-written code that developers can incorporate into their projects, which saves time and effort. Packages and libraries are often distributed as open-source software, allowing developers to use, modify, and redistribute them according to the license terms. Examples include NumPy, Pandas, TensorFlow, and scikit-learn in Python, which provide functionality for numerical computing, data manipulation, machine learning, and more.
The open-source ecosystem is essential for fostering innovation and collaboration within the software development community. It allows developers to build upon each other’s work, encourages transparency, and helps democratize access to powerful tools and technologies. Open-source projects have contributed significantly to the growth and advancement of various fields, including data science, machine learning, web development, and many others.
What is open-source security?
Open-source security refers to the systems, processes, tools, roles, and risks associated with applying open-source software. As community- and volunteer-driven software, open-source is unmatched in the innovation that it can unlock within organizations and has become instrumental throughout every industry. From developers using open-source in their applications to non-technical users leveraging the technology to make their jobs easier, open-source is everywhere. With so many ways open-source can be embedded into an organization, managing and securing this software is an important function of modern IT departments.
Open-source software can be complex.
OSS often relies on other OSS, known as dependencies. Dependencies may also have dependencies of their own, known as transitive dependencies. Hundreds of thousands of OSS packages rely on hundreds of thousands of other OSS packages, resulting in a highly complex dependency graph, as shown below. A single vulnerability—even one embedded in a transitive dependency that you did not explicitly install—may compromise your entire system.
Further, you must proactively manage vulnerabilities as they arise. This can be accomplished using vulnerability scanning, or the use of tools that can scan your codebase and dependencies for known vulnerabilities.
For greater security, vulnerability scanning must be combined with other capabilities, such as secure authentication, policy filters, and user access controls, which enable data science and IT leaders to remove access to known vulnerable software packages and repositories.
While open-source software can be secure, it is essential to understand that no software is entirely immune to vulnerabilities. There are many common cybersecurity attacks on open-source software.
Regular security audits, staying informed about security advisories, and applying security patches promptly will help mitigate potential risks. Additionally, you can complement the security of open-source software by following industry security standards and best practices, which we cover in the next section of this guide.
Many security tools on the market do not adequately serve the needs of Python developers, especially in data science and machine learning. For example, many scanners have high rates of false positives (i.e., spurious alerts) and false negatives (i.e., failure to detect vulnerabilities). Additionally, many scanning solutions do not provide mitigation or remediation suggestions. Using OSS across an enterprise organization requires commitment, time, and expertise to ensure the fidelity and stability of your security posture.
Is open source secure?
This is a complex topic and should be considered holistically in the context of the overall OSS ecosystem.
As with any software, open source can have security vulnerabilities. A software vulnerability is a security flaw, glitch, or weakness found in code. Currently, more than 70 new vulnerabilities are reported per day in the NIST National Vulnerability Database and they are a significant risk factor to consider. As an example, Equifax was imposed a $700 million fine by the Federal Trade Commission, due to a security breach resulting in the exposure of millions of sensitive consumer records. This breach was due to an exploit of vulnerability CVE-2017-5638 in Apache Struts, a popular open-source web application framework.
Supply chain attack is the other major consideration. This risk factor is not unique to open-source software, but is amplified due to the open nature and interconnectedness of the OSS ecosystem: there is a very low barrier of entry, which is good because it encourages participation and fosters innovation, but this also makes it very challenging to keep malicious actors out.
Supply chain attacks are characterized by the injection of malicious code into a software package to compromise dependent systems further down in the chain. Typosquatting and dependency confusion are two examples of such attacks, where malicious code masquerades as legitimate packages; these attacks sometimes also use techniques such as obfuscation and encryption to evade detection. You can learn more in this blog article: 5 Common Cybersecurity Attacks on Open-Source Software.
Supply chain attacks are on the rise. For example, in May 2023 PyPI temporarily suspended the registration of new users and projects because of the overwhelming volume of this malicious activity.
As the ecosystem of OSS expands and usage grows so too does the surface area for potential cyber attacks. In fact, it is precisely because of how popular OSS is, that it is attractive for attacks—because there is a high reward-to-effort ratio in favor of the attacker. Even as cyber attacks become more prevalent and sophisticated, risk can be mitigated and OSS can be implemented securely with the right measures in place. Attack and defense techniques coevolve, so it is essential that organizations be vigilant and proactively leverage current best practices and tools.
Is open source more secure than proprietary software?
Open source is not inherently more or less secure than proprietary software, as they both have advantages and disadvantages in terms of security. Also bear in mind that open source is a vast ecosystem (e.g., PyPI alone has more than 478,000 projects as of September 2023), and security can vary from project to project.
Factors influencing security include the project’s development and testing practices, community involvement, quality of the code, responsiveness to security issues, direct and transitive dependencies, and resources dedicated to security. Both open-source and proprietary software can be secure if developed, maintained, and implemented with security best practices.
Open Source Security
- Transparency: Open-source projects are transparent, allowing anyone to review the source code and identify potential security vulnerabilities. This transparency can lead to quicker identification and resolution of security issues by a larger community of developers and security researchers.
- Community auditing: The large and diverse community of contributors can actively participate in code reviews and security audits, which can improve the overall security of the software.
- Rapid patching: When vulnerabilities are discovered, the open-source community can respond quickly with patches and updates, reducing the window of exposure to potential attacks.
- Lack of resources: Some open-source projects may have limited resources for security testing and maintenance, which could impact their ability to address security issues promptly. Open source can be reviewed by many people, but the key is how much time knowledgeable people actually spend auditing and testing it.
- Code quality: The quality of code in open-source projects can vary significantly. Some projects may have high-quality code with thorough security measures, while others may have less secure code due to a lack of expertise or time constraints.
- Update mechanism: Many open source projects are libraries or utilities, as opposed to entire applications with a built-in update mechanism, so it is up to users to keep up with any security news and update software accordingly.
Proprietary Software Security
- Controlled development: Proprietary software is developed and maintained by a specific organization, which can provide dedicated security teams and resources to address security concerns.
- Controlled access: Proprietary code is not publicly available, which may reduce the risk of attackers finding and exploiting vulnerabilities.
- Notification and delivery: Proprietary software vendors often implement a patch or installation mechanism to facilitate the delivery of updates.
- Limited code review: Proprietary software is typically not open to public scrutiny, which limits the number of people who can review the code for security flaws.
- Slower response: Proprietary software may take longer to address security vulnerabilities and release updates due to the need for coordination with the development team and the company’s release schedules.
Organizations should carefully evaluate the security practices of any software they use and develop, regardless of whether it is open-source or proprietary. Regular updates, security audits, vulnerability scanning, and adherence to security best practices are essential for maintaining the security of any software solution.
Approaches to Open-Source Security
In general, there are three approaches to open-source security:
1. No oversight
Organizations that allow unregulated use of OSS are at the greatest risk from its vulnerabilities. An open approach to OSS implicitly trusts the community of open-source contributors, developers, and security staff to take sufficient measures against vulnerability risks. This approach greatly increases risk because many users across the organization’s networks download packages, each with various version numbers, and vulnerabilities are invisible.
Vulnerabilities enter the software supply chain more often and issues are more likely to wreak havoc. This results in vulnerabilities being handled in a reactive way, rather than preventing them from entering the organization in the first place.
2. Manual processes
Establishing processes and protocols for updating operating systems and packages used in a company is a step in the right direction. However, successfully managing the number of packages that must be kept up to date, given the amount of package and vulnerability metadata, can quickly become overwhelming.
In an automated approach, tools synchronize to a database of vulnerabilities for their tracking, management, prevention, and remediation. An automated approach to open-source security can be the most reliable and best solution, but only if it’s properly configured and if it relies on trustworthy data with minimal false negatives, which can happen when database and tooling fail to properly detect known threats. Keep in mind that overly restrictive policies can quickly reduce confidence in automated tools and lead to the bypassing or ignoring of legitimate warnings.
The most effective way to approach open-source security is with a combination of automated and manual processes, which we describe in greater detail in the next section.
Best Practices in Open-Source Security: Python, R, and Anaconda
For over a decade, Anaconda has been providing secure access to thousands of open-source packages for Python and R. Anaconda gives data science practitioners easy access to the tools they need to innovate and collaborate. They get control over environments—and the resolution of dependencies. Anaconda also provides security teams with the IT governance, auditing, and enhanced security capabilities they need, even in the most sensitive and critical environments.
In light of that experience, our security team has compiled these best practices to use when working with open-source packages for Python, R, and Anaconda.
- Leverage a private repository: While it is possible to download packages directly from the internet, enterprise organizations benefit from private repositories as this layer of indirection facilitates additional security screening, auditing, and governance. Private repositories (for your organization’s dedicated use) also enable you to own your own uptime and can be implemented in the cloud (SaaS), on-premises, as well as behind air-gapped networks.
- Leverage channels to enable governance: The conda ecosystem, including Anaconda, organizes packages into channels for better management. Organize packages from different source channels into different channels in your private repository. Do not mix packages from different source channels into a single channel as this may introduce dependency and security complications. Channels may also be used to govern access if different teams or use cases within your organization have different policies (e.g., experimental code running on synthetic data in a sandbox may have less stringent security requirements than production code).
- Standardize on a common base and mirror only what you need: For instance, mirror from public repositories only the specific version(s) of Python you intend to use. Being selective reduces the surface area for potential vulnerabilities, the amount of security review you will need to do, and the infrastructure load. Additionally, migrating code between library versions—especially for complex enterprise applications and models—can be a time-consuming and expensive undertaking, so be intentional about version selection.
Each channel has different goals and constraints. Your different teams and projects may have different requirements. In choosing OSS package sources, Anaconda’s general guidance is to use the channel that best meets your needs and constraints. Here’s a quick overview of how you can think of some of the common channels.
- Anaconda’s package repository: We aim to provide a set of packages useful to data science and machine learning practitioners. Among the many open-source projects that we review, we choose to package and bring to our users those that are both trusted and stable. Anaconda has a dedicated team of engineers to maintain and improve this repository. We also provide commercial support for these packages.
- conda-forge: Community-supported repository that focuses on having a wide range of packages. Often the open-source teams that created the code will maintain their packages here. Support is by the best effort of the community volunteers.
- Astropy, bioconda, and PyTorch: These are specialty community-maintained channels focusing on specific use cases. Sometimes using them requires Anaconda’s default packages to satisfy the missing dependencies and sometimes they are fully self-contained. These are also mostly supported by volunteers.
- Anaconda Business is an enterprise-grade package repository with features such as access control, auditing, and content trust. Anaconda Business also enables organizations to create allow lists and block lists based on various criteria. For example, you can create rules that filter out packages with Common Vulnerability Scoring System (CVSS) scores that exceed your acceptable threshold, or exclude packages with licenses that do not comply with your legal requirements.
Many open-source projects, such as the CPython interpreter of Python, Django, and Nginx, have a few major releases they maintain in parallel. Different versions of the same OSS package come with different capabilities, performance, security updates, and backward compatibility.
When choosing a package version or considering package updates, we recommend evaluating the following criteria which are often available in the changelog of the package:
Consider the support schedule of the package, as deprecated or EOL (end-of-life) versions of packages no longer receive bug fixes or security updates—at least not from the free community. Bear in mind that the support schedules of different packages may not align neatly with each other. For key packages such as Python or NumPy, plan your sunset/migration strategy well ahead of time.
A breaking change means that code developed using an older version of a package does not work properly with updated versions of that package.
If there are breaking changes, our recommendation is to understand the impact of those changes on your existing code. Sometimes your applications won’t use the changed parts of the library, and sometimes the updates will completely break your core functionality. Generally, it is best to choose the latest version that doesn’t contain changes that will affect your code, but there are situations where you will need to upgrade anyway and plan for additional testing and rework.
These typically include security or critical bug fixes. They are a great source to determine the lower boundary of the package version you need. If there are security fixes in later versions, you should consider moving that version constraint to a later version to ensure you always get those fixes as well.
Interoperability with Other Packages
As you start to build out an environment, you’ll notice that there are many interdependent packages. It is not unusual for one package to need a specific version of a dependency and another package to require a different version. It can be nontrivial to find a set of compatible packages satisfying complex interdependencies. The best way to achieve this goal is to leverage conda to find a solution. Once conda successfully creates an environment according to your specifications, develop test cases to ensure your code works properly in this environment.
No Need to Chase the Latest Versions
The OSS ecosystem evolves fast with new packages and package updates available to users every day. It is not necessary, and often undesirable to have the latest version of every package in your environment. More often than not, attempting to have the latest version of everything will lead to package conflicts and environments that can’t be resolved. We recommend setting minimum acceptable versions in your environment and allowing conda to flexibly find the most compatible solution.
Aim for an environment that is capable, secure, and stable enough to conduct your work.
Working with Environments
Instead of having one giant environment for all of your projects, Anaconda advises our customers to have project- or even task-specific environments to avoid dependency issues between different projects. Smaller environments minimize your surface area for potential security attacks and are also faster to create and easier to manage.
Conda makes environment management easy, and Anaconda recommends our users use conda as their environment manager. Conda works well with both Anaconda and community repositories. With conda, users can easily create and switch between different environments, and perform environment management tasks. To learn more about conda, check out this free conda basics course.
To make it easy for users to get conda, Anaconda provides Miniconda, which is a small bootstrap installer that only includes conda, Python, and their dependencies. The best practice here is to use Miniconda and then install additional packages only as needed.
When upgrading or modifying environments, perform regression testing to ensure environment changes do not introduce unexpected changes to your application or model. You can clone environments or create them from snapshot files to create before and after environments that you can run in parallel for comparison.
It typically takes some iterating to find just the right set of packages for each project. When you are satisfied with an environment, export it to a file, and save this snapshot in your version control system. Besides serving as backup, this mechanism also facilitates reproducibility and collaboration with colleagues.
Dependencies are an essential part of software development, and it’s important to carefully consider what dependencies to include in your project. Here are some guidelines for avoiding unnecessary dependencies to reduce your chances of vulnerability exposure:
- Only include dependencies that you need. Avoid adding dependencies to your projects unnecessarily or for trivial tasks. Such dependencies can bloat your projects and introduce unnecessary risks. Weigh the benefits of having your own implementation of the functionalities you need against the development and maintenance costs, and make informed decisions.
- Take dependencies that improve the security or quality of your code. Some security-critical areas like cryptography, input parsers, and the like benefit from specialized knowledge and higher levels of scrutiny that are hard to replicate locally. It is almost always better to choose a well-vetted and maintained library than try to roll your own cryptography or input processing. Numerical libraries are another case where sticking to a well-vetted version is superior to a roll-your-own approach.
- Leverage language features. Some programming languages, like Python, have a “batteries included” philosophy and provide rich built-in tools and functions. Explore the language’s capabilities and libraries before opting for additional dependencies. When you do need to explore outside of the languages, look for external systems that offer the missing batteries in a coherent fashion.
- Watch out for transitive dependencies. Keep in mind that dependencies that you include in your projects can have their own dependencies, which become the transitive dependencies of your projects. Because of this, a package may have a large dependency footprint. When deciding between multiple packages offering similar functionalities, use the size of the dependency footprint as an additional decision criterion.
By following these best practices, you can minimize the introduction of unnecessary dependencies and reduce associated risks, while still leveraging external libraries when they provide significant values or address specialized requirements, such as cryptography.
As the proverb goes, prevention is better than cure. Defend against supply chain attacks (such as typosquatting, star jacking, and dependency confusion) by mirroring from trusted sources. Shift left and address issues as early as possible by preventing malicious packages from entering your organization in the first place.
Blocking packages according to their security exposure, such as a vulnerability score exceeding your organization’s acceptable threshold can also be an effective vulnerability mitigation measure. Anaconda Business offers powerful filtering logic to help you screen packages. End users will not be able to access packages that you filter out.
Keep in mind that automatic filtering may potentially cause disruption when business-critical packages are filtered out. It is important to evaluate your organization’s risk based on how you actually use the software. There may be situations where the most appropriate course of action is to override your general filter rules and make a security exception. If so, it is important to properly manage exceptions (e.g., by setting a review date).
Monitoring and Matching
Securing open-source software is a dynamic and ongoing process, as newly discovered vulnerabilities may crop up at any time. It is therefore crucial to stay up to date with the evolving threat landscape and understand how it affects your organization’s particular implementation and usage of OSS.
The National Institute of Standards and Technology (NIST) maintains and publishes the National Vulnerability Database (NVD), which contains entries on security vulnerabilities including those that may affect OSS. Each vulnerability entry in NVD contains important information including Common Vulnerability and Exposure (CVE) identifier (ID), description, severity, known affected software configurations, and more.
Given security vulnerability information, the process of determining what components of the OSS you are using are affected by which vulnerabilities are called vulnerability matching. Matching can be a laborious process, unless you have automated tools, either internally developed or through a third party, to handle this critical task. Organizations with subscriptions to Anaconda’s Business tier or above have always-on vulnerability monitoring and matching performed on OSS packages built by Anaconda. Anaconda is also in the process of expanding vulnerability monitoring and matching to conda-forge packages.
Curating and Interpreting
When security vulnerability monitoring and matching reports that a package is affected by one or more known vulnerabilities, there are two critical steps that follow:
- Ensure that the vulnerability information is accurate
A false positive match can lead to unnecessary subsequent mitigation measures, causing wasted efforts and disruption to your work, such as the removal of an essential package from your package repository.
Keep in mind that inaccurate security vulnerability entries in NVD are not uncommon, and it is out of the control of OSS security scanners. To address this issue, Anaconda provides CVE curation to subscriptions of the Anaconda Business tier and above. In CVE curation, Anaconda’s experts research each security vulnerability matched against Anaconda packages, and correct any inaccuracies identified. Equipped with accurate and trustworthy vulnerability information, customers can now interpret the security implications of vulnerabilities with confidence.
- Interpret security vulnerabilities
Based on curated and accurate security vulnerability information, an organization can then holistically assess how the matched vulnerabilities may affect them, by coupling the vulnerability information with how they use the packages affected. Each vulnerability is situational, and interpretation of its security implications is essential in helping organizations in determining the right mitigation measures.
There are various ways to mitigate a security vulnerability. An organization might need different mitigation measures in different situations.
OSS projects may publish newer versions of the packages with security updates that fix known security vulnerabilities that they were exposed to. Updating your environment with the newer versions of these packages is one way to mitigate vulnerability exposure.
Package update, as a way to mitigate vulnerabilities, has its own complexities:
- Possible breaking changes
When a newer version of a package contains changes more than security fixes, such as new features, it may have the risk of bringing breaking changes.
Breaking changes can make originally functional code using an older version of the package dysfunctional. When this happens, the original code needs to be refactored and tested, before they are redeployed to the updated environment with newer versions of the packages.
- Potential ripple impact on the environment
In the event that a package needing a security update has dependencies, also known as transitive dependencies, chances are some of these transitive dependencies might need to upgrade to newer versions for the security updates to work. Updating some of these transitive dependencies could lead to further package updates.
Conda, as a package and environment management tool, is capable of managing the rippling effect described above. However, it is worth pointing out that a seemingly simple “one-package update” can bring significant changes to the entire environment. It is thus essential to test the updated environment thoroughly before redeploying it.
Package backporting means porting a security fix released by OSS project maintainers to earlier versions of the packages, which the security fix does not cover. A package receiving security backport still provides the exact same functionality while having the security vulnerabilities addressed. Backporting is typically performed by a third party other than the project maintainers, and it is performed on older versions of packages.
Package backporting does not bring breaking changes and ripple effect to the environment through transitive dependencies. It gives users enhanced stability of their environment, so they do not need to resort to other more disruptive mitigation measures or they can buy themselves more time to evaluate and prepare for those disruptive changes.
Despite these benefits, package backporting does have one major limitation—package backporting relies on the availability of security fixes from OSS project maintainers.
- Although typically not performed by OSS maintainers, backporting needs the security fixes to be available in the newer versions of the OSS packages, so that these security fixes can be backported to older versions.
- OSS project maintainers’ promptness in response to security vulnerabilities vary, and hence how quickly a backporting is available will also vary.
The private preview program of Anaconda’s security backporting service will be available to existing Anaconda customers in the latter half of 2023. Interested customers can contact their Anaconda representative to inquire how to enroll.
User Code Modification
Modifying user code could be another effective way to mitigate OSS security vulnerabilities. Depending on how complex the security exploits in an OSS package can be blocked, users can:
- Implement constraints on how your code uses the package, as this may effectively block the security exploits. Sometimes it is also possible to disable insecure portions, for example by restricting the algorithms OpenSSL uses to a known safe subset.
- Replace the package with security vulnerabilities with an alternative OSS package or custom code delivering the same or similar functionalities, and update user code accordingly. When replacing a vulnerable package with alternatives or custom code, pay close attention not to introduce other security vulnerabilities or bugs.
As with package updates, user code modification needs to be extensively tested before it is deployed.
Train Your Developers and Adopt Secure Coding Practices
No security tools can fully replace the roles of people and processes in the security picture. Security should be part of the coding process and not an afterthought. Make sure your developers are aware of the potential risks associated with third-party packages, well-versed in secure coding practice, and making security best practices part of their coding. Be proactive in software security rather than reactive. Check out the secure software development guidelines and training materials below, and adopt or adapt them to your organization.
Questions to Ask Your Open-Source Security Provider
Ask potential technology providers these questions to better understand their approach to open-source security capabilities, IT enablement and governance, and securing of the software supply chain.
Platform Security and Capabilities
- 1. What security features does your package repository provide?
- 2. Does your platform provide administrative monitoring, so I can track users, projects and deployments?
- 3. Are your security controls cloud-native?
- 4. How do you identify vulnerabilities? Is it automated or manual—or some combination of both? Describe that process.
- 5. How frequently do you release security updates and patches? What is the process for notifying customers and assisting them with applying these updates?
- 6. What measures are in place to prevent and detect unauthorized access or activities on the platform?
IT Enablement and Governance
- 7. Are there role-based user access controls? Can I limit access by stage of development; for example, can I control who is able to deploy projects?
- 8. Do you offer publishing permissions?
- 9. Are there approval workflows for certain actions or deployments, allowing for a more controlled development and deployment process?
- 10. Can you provide access to audit logs and detailed logs of user activities on the platform?
Securing of the Software Supply Chain
- 11. How do you ensure the authenticity and integrity of the open-source packages available on your platform?
- 12. Are there features to scan for vulnerabilities and potential malware in the packages used within projects?
- 13. Can I use your platform to create audit logs?
- 14. What kind of disaster recovery support do the platform and your team provide?
- 15. Can you provide information on the platform’s historical uptime and any downtime incidents, along with the measures taken to minimize service disruptions?
- 16. Do you contribute to the open-source community? Describe how your organization is involved in the open-source ecosystem.
With Anaconda, you can secure your software supply chain from the start. We give you secure, trusted packages for your Python and R developers, with capabilities to keep malicious packages out of your pipeline. We also provide the security controls you need to block risky software and the governance capabilities and support needed for even the largest teams.
- CVE curation: Anaconda’s human-curated approach to CVE refinement provides accurate reports, eliminating the burdens of false alerts.
- Policy filters: Our policy controls allow your teams to begin their coding with security measures already in place, rather than reacting to threats after the fact.
- User access controls: Utilize our token system to control access to private packages and channels, ensuring only specific individuals and groups have access.
- Software bill of materials (SBOM): Anaconda provides an inventory of your software components.
- Enterprise-grade support: Get support from the Python and R experts, from troubleshooting operational errors to building custom conda packages.
Ready to get started?
Frequently Asked Questions
The security of open-source software is a complex topic that depends on various factors. While open-source software has inherent advantages that can contribute to its security, it is not more or less secure than closed-source (proprietary) software. The security depends on the community’s involvement, the development practices of the project, and the resources available for security auditing and maintenance.
Security-conscious organizations should carefully evaluate any software, whether open-source or proprietary, and consider the reputation of the project, the responsiveness of the development team, and the security practices in place. Regular updates, proper configuration, and adherence to security best practices are essential regardless of the software’s source.
Open source is not necessarily more secure than proprietary software; however, it does have advantages. Open-source software is often considered more secure for several reasons, primarily due to its transparency, community involvement, and rapid response to security issues.
Reasons OSS is often viewed as more secure:
- OSS provides access to source code, allowing anyone to inspect and review it.
- OSS is supported by a large and diverse community of contributors.
- Security vulnerabilities can be addressed quickly with patches and updates.
- OSS allows users to customize their code to fit specific security requirements.
- Security discussions in open-source projects are generally conducted openly, with community involvement.
- Open-source software benefits from continuous improvement through feedback, bug reports, and contributions from the community.
However, it’s important to note that the security of open-source software is not guaranteed solely by its openness. There are also challenges and potential risks associated with using open-source software, such as vulnerabilities introduced through dependencies, code quality issues, and the possibility of malicious actors exploiting popular packages.
The security of open-source projects compared to proprietary ones is not inherently better or worse; it depends on various factors and the specific context of each software project. Both open-source and proprietary software can be developed and maintained with varying levels of security, and each has its advantages and challenges.
Open-source projects are more transparent, community-inspectible, and supported by the community’s rapid patching and updates. However, open source projects have limited resources for security testing and maintenance and the quality of code can vary by project.
Proprietary software is developed and maintained by an organization, so it can have dedicated security teams and controlled access, meaning that proprietary code is not publicly available, potentially obfuscating attack surfaces. However, proprietary solutions have limited code review and thus are not open to public comment. Further, coordination with the development team is required to address security vulnerabilities, which means company responses may be slow.
Open Source Intelligence (OSINT) in cybersecurity refers to the process of collecting and analyzing information from publicly available sources to gain insights into potential security threats, vulnerabilities, or risks. OSINT is a valuable tool for cybersecurity professionals and threat intelligence analysts, as it helps them gather relevant data about potential adversaries, vulnerabilities, or security incidents. The sources of OSINT data can include publicly accessible websites, social media platforms, online forums, public databases, news articles, and more.
Key aspects of open source intelligence in cybersecurity include threat intelligence, vulnerability research, digital footprint analysis, threat actor profiling, reputation monitoring, incident response and forensics, and compliance and regulatory intelligence.
The Open Source Security Testing Methodology Manual (OSSTMM) is a framework for security testing and analysis of computer systems, networks, and applications. It is a comprehensive and practical guide that provides a structured approach to evaluating security measures and identifying potential vulnerabilities in various IT environments. OSSTMM is an open-source project developed and maintained by the Institute for Security and Open Methodologies (ISECOM).
OSSTMM is designed to be adaptable and flexible, allowing security testers to tailor the methodology to specific environments and testing scenarios. It emphasizes the importance of using multiple testing techniques and tools to gain a comprehensive understanding of the target’s security posture.
Security professionals, penetration testers, and organizations use OSSTMM as a reference and guide when conducting security assessments and penetration testing exercises. By following the OSSTMM methodology, they can enhance the security of their systems and applications by identifying and addressing potential security weaknesses before they are exploited by malicious actors.
The Securing Open Source Software Act of 2023 (S.917) is a bill introduced in the United States Senate in March 2023. The primary goal of the bill is to improve the security of open-source software used in Federal agencies and departments. Key objectives of the Securing Open Source Software Act include inventory of open-source software, security review and guidelines, identification of vulnerabilities, and coordination and reporting.
Ensuring open-source container security involves implementing a comprehensive set of best practices and security measures to protect containerized applications and the infrastructure they run on. Containers have become a popular way to package and deploy applications due to their portability and scalability. However, they can also introduce security challenges if not properly managed.
Critical steps to ensure open-source container security include: using trusted base images, keeping containers updated, enabling image signing, securing the container registry, limiting privileges, implementing network segmentation, monitoring container activity, implementing Center for Internet Security (CIS) benchmarks, isolating sensitive data, conduct continuous security testing, implement runtime protection, ensure container orchestration (e.g., Kubernetes) security, educate developers and operations teams, and have a well-defined incident response plan.
Yes, there is an open-source security foundation known as the Open Source Security Foundation (OpenSSF). The OpenSSF is a collaborative initiative that aims to improve the security of open-source software through the consolidation of industry efforts and resources. It was launched in August 2020 under the Linux Foundation.
Other helpful open-source resources include:
An open-source security scanner is a software tool that helps identify security vulnerabilities and weaknesses in computer systems, applications, networks, or codebases. Some of these scanners are open source and freely available to the public, and their source code can be accessed and reviewed by anyone. Others target open-source software but are themselves not OSS.
Security scanners are valuable for developers, system administrators, and security professionals as they aid in discovering and mitigating potential security risks within their software or infrastructure. However, security scanning alone is not enough. Teams must combine scanning for vulnerabilities with manual practices that support a high level of security.
Anaconda, an open-source software and enhanced security provider, uses a combination of security and IT governance capabilities, as well as automated scanning and human curation to help users quickly address vulnerabilities. Anaconda users can exclude packages and repositories that contain vulnerabilities using Anaconda’s data, curated from the National Vulnerability Database (NVD), reports from the community, and Anaconda’s dedicated team of Python and R experts.
Open-source software offers many advantages, including transparency, community collaboration, and cost-effectiveness. However, it also comes with certain security risks, which teams should be aware of and move from reactive to proactive in addressing.
Some of the key open-source security risks include dependency vulnerabilities, lack of support and updates, code quality issues, supply-chain risks, complexity and large attack surface from extensive codebases, lack of accountability for issues, license compliance, misconfiguration, delayed patches, and phishing and author impersonation. Learn more in this blog article, 5 Common Cybersecurity Attacks on Open-Source Software.