Generate Custom Parcels for Cloudera CDH with Anaconda Enterprise 5


As part of our partnership with Cloudera, we offer a freely available Anaconda Python parcel for Cloudera CDH based on the Anaconda Distribution. The Anaconda parcel has been very well-received by both Anaconda and Cloudera users by making it easier for data scientists and analysts to use libraries from Anaconda that they know and love with Hadoop and Spark on Cloudera CDH.

Anaconda Enterprise 5 offers data scientists and administrators self-service generation of custom Anaconda parcels and installers. With this, users can deploy Anaconda with different versions of Python and custom conda packages that are not included in the freely available Anaconda parcel. Using parcels to manage multiple Anaconda installations across a Cloudera CDH cluster is convenient because it works natively with Cloudera Manager without the need to install additional software or services on the cluster nodes.

Deploying multiple custom versions of Python/R with Anaconda on a Cloudera CDH cluster with Hadoop and Spark has never been easier! Let’s take a closer look at how we can create and install a custom Anaconda parcel using Anaconda Enterprise and Cloudera Manager.

Generating Custom Anaconda Parcels

For this example, we’ve installed Anaconda Enterprise 5.1.2 and mirrored more than 1000+ conda packages from the Anaconda distribution into the on-premises repository. We’ve also installed Cloudera CDH 5.14 with Spark on a separate cluster.

First, we log into our Anaconda Enterprise instance. We then navigate to the Packages section of Anaconda Enterprise and select Advanced.

This shows us the Environments tab, where we can view our existing package environments as well as generate new ones.

To generate a parcel, we can create a new environment. We click the + button to begin building our environment.

The environment page provides an overview of how you can create custom environments. To create a new environment, we first select the anaconda channel in the Select Channels section. Then, under Select Packages, we begin choosing the packages we want to install in our custom Anaconda parcel.

In this example, we’ll create an environment with Anaconda 5.1.0. You can search for packages by name or scroll directly to the package you want. To add packages, simply select the checkbox next to each package name.

For this example, we’ve chosen all of the packages in Anaconda. We also could have included conda packages from other channels in our on-premise installation of Anaconda Enterprise, including R conda packages from MRO, community-built packages from conda-forge, or other custom-built conda packages from different users within our organization.

We name the environment “anaconda_parcel”, then click Resolve and Save to generate our new environment.

Depending on the packages selected, the build process might take a few minutes. Once completed, you should see your new environment displayed on the All Environments page.

If this is the first environment you have created, you will now see a new button – CREATE. Select CREATE and choose the appropriate drop-down (installer, parcel, or management pack). In our case, we will select Parcel. Note that this process is identical for generating custom Anaconda installers or management packs.

Once we select Create Parcel, Anaconda Enterprise will generate a custom Anaconda parcel from our environment. As before, this will take a few moments and then you should see your custom Anaconda parcel on the All Environments page. Once completed, we see a list of parcel files that were generated for all of the Linux distributions supported by Cloudera Manager.

Additionally, Anaconda Enterprise has already updated the manifest file used by Cloudera Manager with the new parcel information at the existing Remote Parcel Repository URL. Now, we’re ready to install the newly created custom Anaconda parcel using Cloudera Manager.

Installing Custom Anaconda Parcels Using Cloudera Manager

Now that we’ve generated a custom Anaconda parcel, we can install it on our Cloudera CDH cluster and make it available to all of the cluster users for PySpark and SparkR jobs.

From the Cloudera Manager Admin Console, click the Parcels indicator in the top navigation bar.

Click the Configuration button on the top right of the Parcels page.

Click the plus symbol in the Remote Parcel Repository URLs section, and add the repository URL that was provided from Anaconda Repository.

Select Distribute to install the parcel across your CDH cluster. Congratulations, we are done! The custom-generated Anaconda parcel is now activated and ready to use with Spark or other distributed frameworks on our Cloudera CDH cluster.

Get Started with Custom Anaconda Parcels in Your Enterprise

If you’re interested in generating custom Anaconda installers and parcels for Cloudera Manager, we can help! Get in touch with us by using our contact us page for more information about this functionality and more with Anaconda Enterprise.

If you’d like to test-drive Anaconda Enterprise on a bare-metal, on-premises,  or cloud-based cluster, get in touch with us at [email protected].

The enterprise features of Anaconda Enterprise are certified by Cloudera for use with Cloudera CDH 5.x.



You May Also Like

Data Science Blog
Using Pip in a Conda Environment
Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back-to-back multiple times, establishing a s...
Read More
Data Science Blog
Deep Learning with GPUs in Anaconda Enterprise
AI is a hot topic right now. While a lot of the conversation surrounding advanced AI techniques such as deep learning and machine learning can be chalked up to hype, the under...
Read More
Company Blog
2018 Anaconda State of Data Science Report Released
We at Anaconda greatly value our data science community and are always striving to learn more about how you are using our products and how we can improve your overall experien...
Read More