Integrating Databricks with Anaconda Package Security Manager (On-prem) enables organizations to maintain security and compliance while leveraging the power of both platforms.

For data science teams working in regulated environments, this integration provides essential security controls over Python package usage. Your organization can enforce security policies and maintain consistent environments across development and production. This helps prevent the use of unauthorized or vulnerable packages while providing comprehensive audit trails of package usage across your Databricks workspaces.

This guide explains how to set up a secure, customized Python environment in Databricks using packages from Anaconda’s Package Security Manager (On-prem).

Prerequisites

Before starting, ensure you have:

  • Administrator permissions for your Package Security Manager (On-prem) instance
  • Docker installed on your local machine
  • A Databricks workspace with admin privileges

Setup and configuration

1

Create a Policy

  1. Log into your Package Security Manager (On-prem) instance.
  2. Select Policies in the left-hand navigation.
  3. Click Create Policy.
  4. Name your policy databricks, provide a brief description, and then click Next.
  5. Set your policy to allow packages with noarch or linux-64 architectures only, then click Next.
  6. Skip setting CVE rules and exclusions by clicking Next to progress the policy configuration.
  7. Click Create Policy to save your policy.

For more information on policies, see Policies.

2

Create a channel for Anaconda's packages

  1. Select My Channels in the left-hand navigation.
  2. Click Add Channel.
  3. Name your channel databricks-anaconda.
  4. Set your channel to Private.
  5. Open the Assign Policy dropdown and select your databricks policy.
  6. Click Submit.

For more information on channels, see Channels.

3

Create a mirror for Anaconda's packages

  1. Open the Manage dropdown menu and select Freeze, then click Freeze.

  2. Open the Manage dropdown menu again and select Create Mirror.

  3. Complete the Create mirror form with the following configurations:

    • Name: databricks-anaconda-packages
    • External Source URL: https://repo.anaconda.com/pkgs/main
    • Mirror Type: conda
  4. Verify that the Run now checkbox is selected.

  5. Verify that the Assign Mirror Policy toggle reads APPLY CHANNEL POLICY.

  6. Allow the mirror time to complete.

  7. Open the Manage dropdown and select Unfreeze, then click Unfreeze.

For more information on mirroring, see Mirrors.

4

Create a channel for conda-forge packages

  1. Select My Channels in the left-hand navigation.
  2. Click Add Channel.
  3. Name your channel databricks-conda-forge.
  4. Set your channel to Private.
  5. Open the Assign Policy dropdown and select your databricks policy.
  6. Click Submit.
5

Create a mirror for conda-forge packages

  1. Open the Manage dropdown menu and select Freeze, then click Freeze.
  2. Open the Manage dropdown menu again and select Create Mirror.
  3. Complete the Create mirror form with the following configurations:
    • Name: databricks-conda-forge-packages
    • External Source URL: https://conda.anaconda.org/conda-forge
    • Mirror Type: conda
  4. Verify that the Run now checkbox is selected.
  5. Verify that the Assign Mirror Policy toggle reads APPLY CHANNEL POLICY.
  6. Allow the mirror time to complete.
  7. Open the Manage dropdown and select Unfreeze, then click Unfreeze.
6

Generate a token for your channels

  1. Select Tokens from the left-hand navigation.
  2. Click Generate Token.
  3. Name your token databricks-token.
  4. Set an expiration date for your token that makes sense for your use case.
  5. Open the Type dropdown menu and select Resources.
  6. Open the Channel dropdown menu and select your databricks-anaconda channel.
  7. Open the Permission dropdown menu and select Read.
  8. Click Add Resource.
  9. Open the Channel dropdown menu and select your databricks-conda-forge channel.
  10. Open the Permission dropdown menu and select Read.
  11. Click Create.
  12. Your token displays in the upper-right corner of the page. Copy the token and save it in a secure location.

For more information on tokens, see Tokens.

7

Build a Custom Docker Image

To create a secure Python environment in Databricks, you’ll need to build a custom Docker image using Databricks Container Service. This image will contain your conda-based environment and can be used when launching your Databricks cluster.

For more information, see Customize containers with Databricks Container Service and GitHub - databricks/containers.

  1. Create a directory on your local machine called dcs-conda by running the following command:

    mkdir dcs-conda
    
  2. Enter your new dcs-conda directory and create a Dockerfile file inside the dcs-conda directory:

    cd dcs-conda
    vi Dockerfile
    
  3. Add the following content to the Dockerfile file:

    # Replace `<FQDN>` with the fully qualified domain name of your Package Security Manager (On-prem) instance
    # Replace `<TOKEN>` with the resource token you just generated
    
    FROM ubuntu:22.04 AS builder
    RUN apt-get update && apt-get install --yes \
        wget \
        libdigest-sha-perl \
        bzip2 \
        gcc \
        python3-dev \
        libpq-dev \
        libcairo2-dev \
        libdbus-1-dev \
        libgirepository1.0-dev \
        libsnappy-dev \
        git \
        maven && \
        apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
    RUN wget -q https://repo.anaconda.com/miniconda/Miniconda3-py311_25.1.1-2-Linux-x86_64.sh -O miniconda.sh && \
        /bin/bash miniconda.sh -b -p /databricks/conda && \
        rm miniconda.sh
    FROM databricksruntime/minimal:15.4-LTS
    RUN apt-get update && apt-get install --yes git && \
        apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
    COPY --from=builder /databricks/conda /databricks/conda
    COPY env.yml /databricks/.conda-env-def/env.yml
    RUN /databricks/conda/bin/conda config --system --prepend channels https://<FQDN>/api/repo/t/<TOKEN>/databricks-anaconda && \
        /databricks/conda/bin/conda config --system --append channels https://<FQDN>/api/repo/t/<TOKEN>/databricks-conda-forge && \
        /databricks/conda/bin/conda config --system --remove channels https://repo.anaconda.com/pkgs/main && \
        /databricks/conda/bin/conda config --system --remove channels https://repo.anaconda.com/pkgs/r
    RUN /databricks/conda/bin/conda env create --file /databricks/.conda-env-def/env.yml && \
        ln -s /databricks/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh
    RUN /databricks/conda/bin/conda config --system --set channel_priority strict && \
        /databricks/conda/bin/conda config --system --set always_yes True
    RUN rm -f /root/.condarc
    ENV GIT_PYTHON_GIT_EXECUTABLE=/usr/bin/git
    ENV DEFAULT_DATABRICKS_ROOT_CONDA_ENV="dcs-conda"
    ENV DATABRICKS_ROOT_CONDA_ENV="dcs-conda"
    
  4. Create an env.yml file inside the dcs-conda directory:

    vi env.yml
    
  5. Add the following content to the env.yml file:

    # Replace `<TOKEN>` with the resource token you generated earlier
    
    name: dcs-conda
    channels:
      - https://<FQDN>/api/repo/t/<TOKEN>/databricks-anaconda
      - https://<FQDN>/api/repo/t/<TOKEN>/databricks-conda-forge
    dependencies:
      - python=3.11
      - databricks-sdk
      - grpcio
      - grpcio-status
      - ipykernel
      - ipython
      - jedi
      - jinja2
      - matplotlib
      - nomkl
      - numpy
      - pandas
      - pip
      - pyarrow
      - pyccolo
      - setuptools
      - six
      - traitlets
      - wheel
    

    Please check the recommended package versions in the System environment section of the Databricks Runtime release notes and compatibility documentation.

  6. Build the Docker image:

    docker build -t dcs-conda:15.4-psm .
    
  7. Tag and push your custom image to a Docker registry by running the following commands:

    docker tag dcs-conda:15.4-psm anaconda/dcs-conda:15.4-psm
    docker push anaconda/dcs-conda:15.4-psm
    
8

Launch a Cluster using Databricks Container Service

Clients must be authorized to access Databricks resources using a Databricks account with appropriate permissions. Without proper access, CLI commands and REST API calls will fail. Permissions can be configured by a workspace administrator.

Databricks recommends using OAuth for authorization instead of Personal Access Tokens (PATs). OAuth tokens refresh automatically and reduce security risks associated with token leaks or misuse. For more information, see Authorizing access to Databricks resources.

  1. Open your Databricks workspace.

  2. Select Compute from the left-hand navigation, then click Create compute.

  3. On the New compute page, specify the Cluster Name.

  4. Under Performance, set the Databricks Runtime Version to a version that supports Databricks Container Service. For example - Runtime: 15.4-LTS.

    This version is under long-term support (LTS). For more information, see Databricks support lifecycles.


    Databricks Runtime for Machine Learning does not support Databricks Container Service.

  5. Open the Advanced options dropdown and select the Spark tab.

  6. Add the following Spark configurations:

    spark.databricks.isv.product anaconda-psm # Must always be added, regardless of other settings
    spark.databricks.driverNfs.enabled false
    

    To access volumes on Databricks Container Service, add the following configuration to the compute’s Spark config field as well: spark.databricks.unityCatalog.volumes.enabled true.

  7. Select the Docker tab.

  8. Select the Use your own Docker container checkbox.

  9. Enter your custom Docker image in the Docker Image URL field.

  10. Open the Authentication dropdown and select an authentication method.

  11. Click Create compute.

9

Create a Notebook and connect it to your cluster

  1. Click New in the top-left corner, then click Notebook.

  2. Specify a name for the notebook.

  3. Click Connect, then select your cluster from the resource list.

10

Verify your conda installation

  1. In your notebook, run one of the following commands to check that conda is installed:

    !conda --help
    
    %sh conda --help
    

    Both commands run shell code from the notebook. !conda --help runs the command in the current shell. %sh conda --help starts a subshell, which is useful for multi-line scripts, but might not have the same environment or path.

  2. In your notebook, run the following command to check your source channels:

    !conda config --show channels
    
11

Install MLflow from your Anaconda organization channel

MLflow is available through your Anaconda organization channel for use in your Databricks environment.

  1. In your notebook, install MLflow from your Anaconda organization channel:

    !conda install mlflow
    

    This command installs MLflow and all of its dependencies from your Package Security Manager channel.

  2. In your notebook, verify the installation:

    import mlflow
    print("MLflow: " + mlflow.__version__)