General
When was the general availability release of Anaconda Enterprise v5? Our GA release was August 31, 2017 (version 5.0.3). Our most recent version was released March 10, 2022 (version 5.5.2). Which notebooks or editors does Anaconda Enterprise support? Anaconda Enterprise supports the use of Jupyter Notebooks and JupyterLab, which are the most popular integrated data science environments for working with Python notebooks. In version 5.2.2 we added support for Apache Zeppelin, a web-based notebook that enables data-driven, interactive data analytics and collaborative documents with interpreters for Python, R, Spark, Hive, HDFS, SQL, and more. Can I deploy multiple data science applications to Anaconda Enterprise? Yes, you can deploy multiple data science applications and languages across an Anaconda Enterprise cluster. Each data science application runs in a secure and isolated environment with all of the dependencies from Anaconda that it requires. A single node can run multiple applications based on the amount of compute resources (CPU and RAM) available on a given node. Anaconda Enterprise handles all of the resource allocation and application scheduling for you. Does Anaconda Enterprise support high availability deployments? Partially. Some of the Anaconda Enterprise services and user-deployed apps will be automatically configured when installed to three or more nodes. Anaconda Enterprise provides several automatic mechanisms for fault tolerance and service continuity, including automatic restarts, health checks, and service migration. For more information, see Fault tolerance in Anaconda Enterprise. Which identity management and authentication protocols does Anaconda Enterprise support? Anaconda Enterprise comes with out-of-the-box support for the following:- LDAP / AD
- SAML
- Kerberos
System requirements
What operating systems are supported for Anaconda Enterprise? Please see operating system requirements.Linux distributions other than those listed in the documentation can be supported on request.
Installation
How do I install Anaconda Enterprise? The Anaconda Enterprise installer is a single tarball that includes Docker, Kubernetes, system dependencies, and all of the components and images necessary to run Anaconda Enterprise. The system administrator runs one command on each node. Can Anaconda Enterprise be installed on-premises? Yes, including airgapped environments. Can Anaconda Enterprise be installed on cloud environments? Yes, including Amazon AWS, Microsoft Azure, and Google Cloud Platform. Does Anaconda Enterprise support air gapped (off-line) environments? Yes, the Anaconda Enterprise installer includes Docker, Kubernetes, system dependencies, and all of the components and images necessary to run Anaconda Enterprise on-premises or on a private cloud, with or without internet connectivity. We can deliver the installer to you on a USB drive. Can I build Docker images for the install of Anaconda Enterprise? No. The installation of Anaconda Enterprise is supported only by using the single-file installer. The Anaconda Enterprise installer includes Docker, Kubernetes, system dependencies, and all of the components and images necessary for Anaconda Enterprise. Can I install Anaconda Enterprise on my own instance of Kubernetes? Yes, please refer to our Kubernetes installation guide. Can I get the AE installer packaged as a virtual machine (VM), Amazon Machine Image (AMI) or other installation package? No. The installation of Anaconda Enterprise is supported only by using the single-file installer. Which ports are externally accessible from Anaconda Enterprise? Please see network requirements. Can I use Anaconda Enterprise to connect to my Hadoop/Spark cluster? Yes. Anaconda Enterprise supports connectivity from notebooks to local or remote Spark clusters by using the Sparkmagic client and a Livy REST API server. Anaconda Enterprise provides Sparkmagic, which includes Spark, PySpark, and SparkR notebook kernels for deployment. How can I manage Anaconda packages on my Hadoop/Spark cluster? An administrator can generate custom Anaconda parcels for Cloudera CDH or custom Anaconda management packs for Hortonworks HDP using Anaconda Enterprise. A data scientist can use these Anaconda libraries from a notebook as part of a Spark job. On how many nodes can I install Anaconda Enterprise? You can install Anaconda Enterprise in the following configurations during the initial installation:- One node (one master node)
- Two nodes (one master node, one worker node)
- Three nodes (one master node, two worker nodes)
- Four nodes (one master node, three worker nodes)
-
Generate self-signed temporary certificates. On the master node, run:
Replace
DESIRED_FQDN
with the fully-qualified domain of the cluster to which you are installing Anaconda Enterprise. Saving this file as/var/lib/gravity/planet/share/secrets.yaml
on the Anaconda Enterprise master node makes it accessible as/ext/share/secrets.yaml
within the Anaconda Enterprise environment which can be accessed with the commandsudo gravity enter
. -
Update the
certs
secret Replace the built-incerts
secret with the contents ofsecrets.yaml
. Enter the Anaconda Enterprise environment and run these commands:
GPU Support
How can I make GPUs available to my team of data scientists? If your data science team plans to use version 5.2 of the Anaconda Enterprise AI enablement platform, here are a few approaches to consider when planning your GPU cluster:- Build a dedicated GPU-only cluster. If GPUs will be used by specific teams only, creating a separate cluster allows you to more carefully control GPU access.
- Build a heterogeneous cluster. Not all projects require GPUs, so a cluster containing a mix of worker nodes—with and without GPUs—can serve a variety of use cases in a cost-effective way.
- Add GPU nodes to an existing cluster. If your team’s resource requirements aren’t clearly defined, you can start with a CPU-only cluster, and add GPU nodes to create a heterogeneous cluster when the need arises.
- Keras (
keras-gpu
) - TensorFlow (
tensorflow-gpu
) - Caffe (
caffe-gpu
) - PyTorch (
pytorch
) - MXNet (
mxnet-gpu
)
- XGBoost (
py-xgboost-gpu
)
- CuPy (
cupy
) - Numba (
numba
)
Unless a package has been specifically optimized for GPUs (by the authors) and built by Anaconda with GPU support, it will not be GPU-accelerated, even if the hardware is present.
m4.4xlarge
instance and one GPU worker node running on a p3.2xlarge
instance. More users will require more worker nodes—and possibly a mix of CPU and GPU worker nodes.
See Installation requirements for the baseline hardware requirements for Anaconda Enterprise.
How many GPUs does my cluster need?
A best practice for machine learning is for each user to have exclusive use of their GPU(s) while their project is running. This ensures they have sufficient GPU memory available for training, and provides more consistent performance.
When an Anaconda Enterprise user launches a notebook session or deployment that requires GPUs, those resources are reserved for as long as the project is running. When the notebook session or deployment is stopped, the GPUs are returned to the available pool for another user to claim.
The number of GPUs required in the cluster can therefore be determined by the number of concurrently running notebook sessions and deployments that are expected. Adding nodes to an Anaconda Enterprise cluster is straightforward, so organizations can start with a conservative number of GPUs and grow as demand increases.
To get more out of your GPU resources, Anaconda Enterprise supports scheduling and running unattended jobs. This enables you to execute periodic retraining tasks—or other resource-intensive tasks—after regular business hours, or at times GPUs would otherwise be idle.
What kind of GPUs should I use?
Although the Anaconda Distribution supports a wide range of NVIDIA GPUs, enterprise deployments for data science teams developing models should use one of the following GPUs:
- Tesla V100 (recommended)
- Tesla P100 (adequate)
Anaconda Project
What operating systems and Python versions are supported for Anaconda Project? Anaconda Project supports Windows, macOS and Linux, and tracks the latest Anaconda releases with Python 2.7, 3.5, 3.6, and 3.7. How is encapsulation with Anaconda Project different from creating a workspace or project in Spyder, PyCharm, or other IDEs? A workspace or project in an IDE is a directory of files on your desktop. Anaconda Project encapsulates those files, but also includes additional parameters to describe how to run a project with its dependencies. Anaconda Project is portable and allows users to run, share, and deploy applications across different operating systems. What types of projects can I deploy? Anaconda Project is very flexible and can deploy many types of projects with conda or pip dependencies. Deployable projects include:- Notebooks (Python)
- Bokeh applications and dashboards
- REST APIs in Python (including machine learning scoring and predictions)
- Python scripts
- Third-party apps, web frameworks, and visualization tools such as Tensorboard, Flask, Falcon, deck.gl, plot.ly Dash, and more.
anaconda-project.yml
file.
If you were using modified template anaconda-project.yml
files for Python
2.7, 3.5, or 3.6 it is best to leave the package list empty in the env_specs
section. Then you should add your required packages and their versions to the
global package list.
Here’s an example using the Python 3.6 template anaconda-project.yml
file from AE version 5.3.1 where the package list has been removed from the
env_specs
and the required packages added to the global list.