Skip to content
  • Products
        • AI PLATFORM

          • Platform OverviewThe Anaconda AI Platform is the only unified AI platform for open source saving users time, money, and risk.
          • Trusted DistributionMore than 12,000 vetted Python packages. Free Download
          • Secure GovernanceEnterprise-grade governance with role-based access control
          • Actionable InsightsComprehensive analytics on package usage, team collaboration, and resource utilization
        • ANACONDA AI PLATFORM

          • Get Started
          • Get a Demo
        • PRICING

          • Anaconda Pricing
  • Solutions
        • BY INDUSTRY

          • All Industries
          • Academia
          • Financial
          • Government
          • Healthcare
          • Manufacturing
          • Technology
        • PROFESSIONAL SERVICES

          • Anaconda Professional Services
  • Resources
        • RESOURCE CENTER

          • All Resources
          • Topics
          • Blog
          • Guides
          • Webinars
          • Podcast
          • Whitepapers
        • FOR USERS

          • Download Distribution
          • Docs
          • Support Center
          • Community
          • Anaconda Learning
  • Company
        • COMPANY

          • About Anaconda
          • Leadership
          • Our Open-Source Commitment
          • Newsroom
          • Press Releases
          • Careers
        • PARTNER NETWORK

          • Partners
          • Technology Partners
          • Channels and Services Partners
          • Become a Partner
        • CONTACT

          • Contact Us
  • Free Download
  • Sign In
  • Get Demo

Home Blog Self-Service Open Data Science: Custom Anaconda Parcels for Cloudera CDH

Technical Notes

Self-Service Open Data Science: Custom Anaconda Parcels for Cloudera CDH

  • By Anaconda Team
  • February 14, 2025

This post refers to Anaconda Enterprise 4. To generate custom parcels in Anaconda Enterprise 5, see here.

Earlier this year, as part of our partnership with Cloudera, we announced a freely available Anaconda parcel for Cloudera CDH based on Python 2.7 and the Anaconda Distribution. The Anaconda parcel has been very well received by both Anaconda and Cloudera users by making it easier for data scientists and analysts to use libraries from Anaconda that they know and love with Hadoop and Spark along with Cloudera CDH.

We’re excited to announce a new self-service feature of the Anaconda platform that can be used to generate custom Anaconda parcels and installers. This functionality is now available in the Anaconda platform as part of the Anaconda Scale and Anaconda Repository platform components.

Earlier this year, as part of our partnership with Cloudera, we announced a freely available Anaconda parcel for Cloudera CDH based on Python 2.7 and the Anaconda Distribution. The Anaconda parcel has been very well received by both Anaconda and Cloudera users by making it easier for data scientists and analysts to use libraries from Anaconda that they know and love with Hadoop and Spark along with Cloudera CDH.

Since then, we’ve had significant interest from Anaconda Enterprise users asking how they can create and use custom Anaconda parcels with Cloudera CDH. Our users want to deploy Anaconda with different versions of Python and custom conda packages that are not included in the freely available Anaconda parcel. Using parcels to manage multiple Anaconda installations across a Cloudera CDH cluster is convenient, because it works natively with Cloudera Manager without the need to install additional software or services on the cluster nodes.

We’re excited to announce a new self-service feature of the Anaconda platform that can be used to generate custom Anaconda parcels and installers. This functionality is now available in the Anaconda platform as part of the Anaconda Scale and Anaconda Repository platform components.

Deploying multiple custom versions of Anaconda on a Cloudera CDH cluster with Hadoop and Spark has never been easier! Let’s take a closer look at how we can create and install a custom Anaconda parcel using Anaconda Repository and Cloudera Manager.

Generating Custom Anaconda Parcels and Installers

For this example, we’ve installed Anaconda Repository (which is part of the Anaconda subscription plan) and created an on-premises mirror of more than 600 conda packages that are available in the Anaconda distribution. We’ve also installed Cloudera CDH 5.8.2 with Spark on a cluster.

In Anaconda Repository, we can see a new feature for Installers, which can be used to generate custom Anaconda parcels for Cloudera CDH or standalone Anaconda installers.

The Installers page gives an overview of how to get started with custom Anaconda installers and parcels, and it describes how we can create custom Anaconda parcels that are served directly from Anaconda Repository from a Remote Parcel Repository URL.

After choosing Create new installer, we can then specify packages to include in our custom Anaconda parcel, which we’ve named anaconda_plus.

First, we specify the latest version of Anaconda (4.2.0) and Python 2.7. We’ve added the anaconda package to include all of the conda packages that are included by default in the Anaconda installer. Specifying the anaconda package is optional, but it’s a great way to supercharge your custom Anaconda parcel with more than 200 of the most popular Open Data Science packages, including NumPy, Pandas, SciPy, matplotlib, scikit-learn and more.

We also specified additional conda packages to include in the custom Anaconda parcel, including libraries for natural language processing, visualization, data I/O and other data analytics libraries: azure, bcolz, boto3, datashader, distributed, gensim, hdfs3, holoviews, impyla, seaborn, spacy, tensorflow and xarray.

We also could have included conda packages from other channels in our on-premise installation of Anaconda Repository, including community-built packages from conda-forge or other custom-built conda packages from different users within our organization.After creating a custom Anaconda parcel, we see a list of parcel files that were generated for all of the Linux distributions supported by Cloudera Manager.

Additionally, Anaconda Repository has already updated the manifest file used by Cloudera Manager with the new parcel information at the existing Remote Parcel Repository URL. Now, we’re ready to install the newly created custom Anaconda parcel using Cloudera Manager.

Installing Custom Anaconda Parcels Using Cloudera Manager

Now that we’ve generated a custom Anaconda parcel, we can install it on our Cloudera CDH cluster and make it available to all of the cluster users for PySpark and SparkR jobs.

From the Cloudera Manager Admin Console, click the Parcels indicator in the top navigation bar.

Click the Configuration button on the top right of the Parcels page.

Click the plus symbol in the Remote Parcel Repository URLs section, and add the repository URL that was provided from Anaconda Repository.

Finally, we can download, distribute and activate the custom Anaconda parcel.

And we’re done! The custom-generated Anaconda parcel is now activated and ready to use with Spark or other distributed frameworks on our Cloudera CDH cluster.

Using the Custom Anaconda Parcel

Now that we’ve generated, installed and activated a custom Anaconda parcel, we can use libraries from our custom Anaconda parcel with PySpark.

You can use spark-submit along with the PYSPARK_PYTHON environment variable to run Spark jobs that use libraries from the Anaconda parcel, for example:

$ PYSPARK_PYTHON=/opt/cloudera/parcels/anaconda_plus/bin/python spark-submit pyspark_script.py

Or, to work with Spark interactively on the Cloudera CDH cluster, we can use Jupyter Notebooks via Anaconda Enterprise Notebooks, which is a multi-user notebook server with collaboration and support for enterprise authentication. You can configure Anaconda Enterprise Notebooks to use different Anaconda parcel installations on a per-job basis.

Get Started with Custom Anaconda Parcels in Your Enterprise

If you’re interested in generating custom Anaconda installers and parcels for Cloudera Manager, we can help! Get a Demo for more information about this functionality pricing options.

The features of the Anaconda platform, including the distributed functionality in Anaconda Scale and on-premises functionality of Anaconda Repository, are certified by Cloudera for use with Cloudera CDH 5.x.

PrevPrevious ArticleThe Power of Local Data Science and AI with Anaconda and Lenovo Workstations
Next ArticleExploring the Latent Space of Programming Styles: How Persona Prompting Unlocks Hidden AI CapabilitiesNext
You May Also Like

Build a Scikit-Learn Model Using Snowpark for Python

Albert DeFusco

Tiny Giants in AI: Benchmarking Specialized SQL Models Against Industry Heavyweights

Andrew Huang

Sustaining and Advancing Jupyter: NbClassic 1.3 and Jupyter-Fsspec 0.4

Rosio Reyes

You May Also Like

Build a Scikit-Learn Model Using Snowpark for Python

Albert DeFusco
Read More

Tiny Giants in AI: Benchmarking Specialized SQL Models Against Industry Heavyweights

Andrew Huang
Read More

Sustaining and Advancing Jupyter: NbClassic 1.3 and Jupyter-Fsspec 0.4

Rosio Reyes
Read More
Company
  • Press
  • Careers
  • Contact Us
  • Press
  • Careers
  • Contact Us
Products
  • Anaconda AI Platform
  • Capabilities
  • Professional Services
  • Pricing
  • Anaconda AI Platform
  • Capabilities
  • Professional Services
  • Pricing
Customers
  • Cloud Login
  • Support Center
  • Learning Catalog
  • Docs
  • Community
  • Cloud Login
  • Support Center
  • Learning Catalog
  • Docs
  • Community
Popular
  • AI Platform Guide
  • Topics
  • Guides
  • AI Platform Guide
  • Topics
  • Guides
Create Account
Contact Sales
Create Account
Contact Sales
Facebook X-twitter Linkedin Github Instagram Youtube

© 2025 Anaconda Inc. All rights reserved.

  • Legal
  • Privacy Policy
  • Terms of Use
  • Legal
  • Privacy Policy
  • Terms of Use