Selecting an Enterprise Platform for Python and Open Source: A Checklist for Buyers

Updated June 30, 2023



Introduction


Marc Andreessen famously opened an August 2011 blog article with this provocative sentence: “Software is eating the world.” His prediction was that software development would disrupt traditional industries. Indeed, companies like Airbnb, Netflix, and Uber emerged as just a few of many winners in the “on-demand” economy that disrupted industries like travel, entertainment, and shopping in significant and lasting ways.

About a year later, in October 2012, Harvard Business Review reported that data scientist was the “sexiest job of the 21st century,” promising professionals who could “coax treasure out of unstructured data.” And the race to structured data began, with organizations taking a closer look at their messy data and finding ways to make it more consumable by machines.

Fast-forward seven years to October 2019, and McKinsey Global Institute offered exciting and cautionary words about the “coming of AI spring.” Their research showed hundreds of business cases that, combined, had the potential to create between $3.5 trillion and $5.8 trillion in value annually. As organizations applied artificial intelligence, they found it could yield outsized business value. Data science capabilities emerged as a prerequisite for high-performing AI, so organizations increased their investments in technologies, data science teams, and techniques like machine learning and deep learning.

In August 2022, Stable Diffusion rocked the visual arts world with its text-to-image model built with Python and deep learning that could generate detailed images based on text prompts. It stoked the world’s fascination with AI and unleashed its next wave: generative AI.

Finally, three months later, in November 2022, OpenAI released another generative model: ChatGPT, a large language model (LLM) that uses training on OpenAI’s GPT-3 and GPT-4 LLMs to generate text based on prompts from the user. In a short time, LLMs have taken many industries by storm, with new products and capabilities that make it possible for programmers to write and debug code alongside a machine and for writers to work with AI to produce content, just to name a couple of a growing number of use cases.

Finding an Enterprise Platform

With all of this progress, it would seem that every organization should somehow incorporate these new technologies into their research, products, and operations. However, applying these techniques to their fullest potential requires a set of fully featured tools, clean and structured data, expert teams, and the power of open-source software, backed by an engaged community of makers and maintainers.

Leveraging the power of open-source software across an enterprise organization requires capabilities for building and deploying secure Python solutions. There are a burgeoning number of options available for stitching together tools that can enable teams to collaborate and build powerful applications with data science and machine learning. But bolting together tools to deploy predictive models into production is not the best way to approach creating a platform that your organization can rely on to deliver excellent outcomes. And building your own can be expensive and complex, because you’ll need to maintain the platform you create.

Finding an enterprise platform that can provide the open-source packages you need, the managed environments that allow you to reproduce and scale models in production, and the security tools to protect your organization from bad code and bad actors can be a tough challenge. That’s what this guide is all about—exploring what to consider when you are selecting an enterprise platform to use with Python and open-source software to achieve your organization’s development goals. 

What makes a platform?

One popular description of a platform comes from Microsoft CEO Bill Gates, as paraphrased by Charmath Palapithiya: “A platform is when the economic value of everybody that uses it, exceeds the value of the company that creates it.” As you evaluate platforms, consider these basic characteristics that will help you leverage the innovation of the community as your teams develop and deploy applications using open source with Python:

  • Number of individual users: The more users, the more opportunities there are to discover new techniques shared by others, identify and address security risks faster, and benefit from a rich community of software makers and maintainers.
  • Number of enterprise users: The more enterprise users, the more the platform has been tested at scale. Number of users may be expressed as a percentage of a total group of organizations or businesses, such as the Fortune 500.
  • Years of experience: The longer an organization has been working to develop their platform, the more expertise their team is likely to have across tools, techniques, and use cases.

Cross-industry customers: The more industries in which a platform has been applied, the more integrations, use cases, and data types the platform and supporting team have likely encountered.

Python, Open Source, Data Science, and the Enterprise

Data science has revolutionized the way businesses operate. Today, it seems that everyone is working with data in some capacity, whether it’s analyzing customer behavior, building predictive models, or creating generative models. As the demand for data-driven insights continues to grow, Python has emerged as the go-to language for data science work.

In fact, Python has long been the gold standard for data science work, thanks in large part to its simplicity and versatility. Unlike other languages, Python allows users to easily manipulate and analyze data, making it an ideal choice for everything from data visualization to machine learning. Additionally, the availability of numerous open-source libraries and frameworks ensures that Python remains a popular choice for data scientists.

Open-source software provides developers access to a global network of contributors who are constantly updating and improving code, making it possible for companies to create applications much faster and more efficiently than ever before. The vast majority (96%) of code bases contain open-source software, according to the Synopys 2023 OSS Risk Analysis Report.

The widespread application of OSS makes sense; open source not only helps companies save on licensing costs, but also allows them to leverage the collective knowledge of the open-source community to create customized solutions to meet their specific business needs. As a result, open source has become an essential strategic tool for organizations looking to stay ahead in the fast-paced world of technology.

However, managing Python development in enterprise organizations has become more complex and difficult over the past few years. This is due in part to the rapid pace of development within the Python community, which has led to the release of new tools and technologies on a regular basis. Some of these tools are proprietary, and some are open source. While this is ultimately a positive development for teams that work with data, it can make it challenging to keep up with the newest techniques and best practices.

Despite these challenges, Python remains one of the most powerful and versatile tools available for data science work. As the industry continues to evolve, Python will remain a critical component of any successful data science team’s toolbox.

Enterprise Python Challenges


At Anaconda, we speak with organizations around the world who are working with Python. We find that most of these teams are experiencing similar challenges, and they are attempting to solve them in similar ways.

1. Package Management and Build Environments

This image features green and white text on a black background. It is titled: Common Python Challenges and includes these three challenges: 1) Package management; 2) Collaboration and deployment; and 3) Governance and security. Each challenge has an icon shown above it that represents the challenge.
Common Python challenges for enterprise teams include package management, collaboration and deployment, and governance and security.

For busy enterprise teams, managing packages and build environments is a significant challenge. Many teams manage packages manually, which has the advantage of giving them control over each package and the customization of environments. However, this is time-consuming and error prone. It also can lead to inconsistent environments and lack of oversight for data protection and governance of resources.

Other teams use proprietary third-party package management tools, which can streamline package management and provide off-the-shelf functionality. However, these tools are not suited to Python workflows. They offer limited customization and force you to rely on vendors to build out the tool to meet your business needs. 

2. Collaboration and Deployment

Project collaboration is an important part of building and scaling great models, so reproducibility is a formidable challenge, especially for large teams. Most teams do this in a fractured way, with models on individual machines, leading to the often-heard phrase among data scientists and data engineers: “It works on my machine.” 

When it comes to deployment, manual processes give you more control over your pipeline but, like manual package management, are time-consuming and prone to errors and scalability issues. Building your own infrastructure for deployment allows you to customize and also gives you more control, but you may see lower return on investment due to high development and maintenance costs.

There are easy-to-use machine learning platforms with off-the-shelf functionality and some support, but these can be highly restrictive compared to open-source software, with limited customization options. They also can be quite expensive.

3. Governance and Securing the Open-Source Pipeline

A trusted source for your open-source packages has never been more important. The March 2023 National Cybersecurity Strategy and frameworks from the National Institute of Standards and Technology (NIST) show that the burden of security is shifting to organizations and individuals who develop software.

Manual security audits can help you meet minimum regulatory requirements and identify some security risks. However, they, too, are time-consuming and resource intensive, and they put your organization in a reactive position. In-house security training can increase awareness and promote good practices, but its effectiveness is limited and it is insufficient on its own.

Third-party scanning tools are often easy to use and, like some machine learning platforms, offer off-the-shelf functionality and some support. However, these tools are not suited to Python workflows, throw a high rate of false positives, and can mishandle compiled packages.

The Top Features to Look for in an Enterprise Python Platform: A Buyer’s Checklist


An enterprise platform should be flexible enough to meet your needs today and powerful enough to withstand the demands of your future workloads and projects. You can use this checklist as you evaluate enterprise platforms for Python and open-source software.

FUNDAMENTAL CAPABILITIES

1. Data Integration

Other vendor(s):
Integration is possibleIntegration is possibleIntegration is not possible
Code repositories (Git, Bitbucket)
Data lake support 
Filesystems 
Hadoop (Cloudera, Hortonworks, EMR)
IoT/sensor data 
Monitoring solutions (log shipping)
NoSQL
Proprietary databases (SAS, Teradata)
SQL
Web data integration

2. Infrastructure and Hardware

Other vendor(s):
Supported, and air gapped is an optionSupported, and air gapped is an optionSupported but not air gappedNot supported
AWS Sagemaker
Azure
Domino Data Lab MLOps
Google
Microsoft Azure
Oracle Cloud Infrastructure (OCI)
Snowpark for Python
On premises (VSphere)
On premises (bare metal)
Air gapped
GPU and CPU support

3. Machine Learning Capabilities 

Other vendor(s):
SupportedSupportedNot supported
Classification & regression
Deep learning
Generative adversarial networks (GANs)
Pre-trained large language models (LLMs)
Reinforcement learning
Support vector machines (SVMs)
Testing strategies (A/B, multi-armed bandit, sensitivity analysis)
Text-to-image models
Text & image analytics and processing
Time-series analysis

4. Collaboration and Deployment

Other vendor(s):
AvailableAvailableNot available
Centralized project hub
Deploy with one click
Deploy REST API
Deploy webapp
Governance controls for collaboration and deployment
Job scheduler / automation
Version control
Visualizations and dashboards

5. Support

Other vendor(s):
AvailableAvailableNot available
Dedicated support contacts
Guaranteed uptime SLA
Access tokens
Advanced troubleshooting support
Assistance with Anaconda package management
Custom conda package builds
Custom installer builds
Environment management issues
Learning: Live and on-demand
Repository access during high demand
Severity response: Level 112 hours, standard1 hour, premium
Severity response: Level 224 hours, standard12 hours, premium
Technical support

6. Security and Governance

Other vendor(s):
IncludedIncludedCan integrateNot possible
Administrative monitoring (track users, projects, deployments)
Audit logs
Cloud-native security controls
Disaster recovery
End-to-end encryption
Package signature verification
Publishing permissions
Role-based user access controls
Scanning for common vulnerabilities and exposures (CVEs)
Secure package repository
Software bill of materials (SBOM)

COLLABORATION AND TOOLS

1. Notebooks and Integrated Development Environments (IDEs)

Other vendor(s):
PurposeIncludedIncludedNot included
Jupyter NotebookCreating and sharing computational documents
JupyterLabWeb-based interface for Juypyter
PyCharmIDE for programming in Python
RStudioIDE tools for Python and R
SpyderScientific Python development environment for scientific programming
Visual Studio Code (VS Code)Source-code editor for debugging, snippets, code refactoring, and more

2. Data Visualization Capabilities

Other vendor(s):
SupportedSupportedNot supported
Allows users to choose their favorite plotting library (e.g., Bokeh, hvPlot, Matplotlib, Plotly)
Supports fully interactive visualizations
Supports visualizing very large (i.e., petabyte) datasets
Supports visualization in Jupyter or as stand-alone applications

3. Data Science and Machine Learning Libraries

Anaconda gives you access to thousands of libraries. We name just a few of the most common libraries below to help you compare your options.

Other vendor(s):
PurposeAccessibleAccessibleNot accessible
DaskParallel and distributed computing
DjangoPython web framework for design
FlaskModel deployment
KerasDeep-learning framework (API for TensorFlow)
KubeflowML workflows on Kubernetes
MLflowExperiment tracking
NumPyMathematical operations on arrays
PandasWork with data sets—analyzing, cleaning, exploring, and manipulating data
ProphetTime-series forecasting in Python
PyTorchDevelop and train deep learning models
SciPyScientific and technical computing (built on NumPy)
Scikit-learnML library for classification, regression, and clustering algorithms
TensorFlowDevelop and train ML models
TheanoMathematical expressions involving multi-dimensional arrays (built on NumPy)
XGBoostDistributed gradient boosting library

4. Model Deployment and Management

Other vendor(s):
YesYesNo
Deployment from QA
Deployment to production
One-click deployment to pre-provisioned resources
Refine models in production
Reproducibility—rollback to older models
Centralized administration of deployed apps

Anaconda’s Platform Makes Innovation Possible


For more than a decade, industry leaders have been using Anaconda’s platform to build some of the world’s most innovative predictions, products, and experiences. Data science and machine learning teams count on our trusted packages and capabilities to centralize open-source software access and empower consistent, reproducible workflows.

This image features white and green text on a black background and lists Anaconda's platform capabilities that address three major challenges when working with Python in the enterprise. 1) Build: Provide trusted packages, centralize your distribution process, and enable consistent, reproducible environments. 2) Deploy: Enable project collaboration, centralize workflows, and make deployment easy. 3) Secure: Enable IT visibility and access controls, and keep license and vulnerability risks out of your software supply chain.
Anaconda’s platform has capabilities that address three major challenges when working with Python in the enterprise.

Enterprise practitioners use our platform to collaborate across users and teams, centralize workflows for better reproducibility and scalability, and deploy models into production with just one click.

IT administrators and security teams choose Anaconda because it is the only platform in the Python ecosystem with access to thousands of packages that—unlike those from community package providers—are privately hosted, built from source, and free from malicious packages.

Finally, you can deploy Anaconda in the cloud or on premises—with private cloud, managed hosting, and air-gapped options—making Anaconda the platform of choice for those working in highly regulated industries and/or with sensitive or protected data. With Anaconda, peace of mind evolves from fantasy to reality. 

Ready to learn more about how Anaconda can help your teams build and deploy secure Python solutions, faster? Book time with one of our experts to discuss your organization’s requirements.

Reviewed and maintained by:

Christian Capdeville, Director of Product Marketing, Anaconda
Saundra Monroe, Director of Product Management, Anaconda
Jim Bednar, Director of Custom Services, Anaconda