The anaconda-ai package exposes both a CLI and a Python SDK for working with Anaconda AI Navigator. With it, you can:

  • Download and manage quantized LLMs
  • Launch and manage inference servers
  • Integrate with popular frameworks like LangChain, LlamaIndex, DSPy, and more

Installation

Install the package into your conda environment:

conda install conda-forge::anaconda-ai

Configuration

Configuration settings are defined in ~/.anaconda/config.toml under [plugin.ai].

Configurable parameters

ParameterEnvironment VariableDescriptionDefault
stop_server_on_exitANACONDA_AI_STOP_SERVER_ON_EXITAutomatically stop servers when Python interpreter exitstrue
[plugin.ai]
stop_server_on_exit = true

Model reference format

Quantized models follow this reference format:

<AUTHOR>/<MODEL>/<QUANT>.<EXT>
  • <AUTHOR>: Model’s publisher name (optional)
  • <MODEL>: Model name
  • <QUANT>: Quantization method (Q4_K_M, Q5_K_M, Q6_K, Q8_0)
  • <EXT>: File extension, usually .gguf (optional)

The model name and quantization method must be separated by either / or _.

AI Navigator CLI

The CLI provides functionality for managing models and servers using the anaconda ai command.

SubcommandDescription
create-tableCreate a table in the vector database
downloadDownload a model
drop-tableDrop a table from the vector database
launchLaunch an inference server for a model
launch-vectordbStart a vector database
list-tablesList all tables in the vector database
modelsList model information
removeRemove a downloaded model
serversList running servers
stopStop a running server
stop-vectordbStop the vector database

Append --help to any subcommand for more information.

AI Navigator SDK

The AI Navigator SDK provides a Python interface for managing models and inference servers. Use it to list and download quantized models, configure and launch API servers, and integrate directly into your application’s workflows.

Client initialization

from anaconda_ai import get_default_client
client = get_default_client()

Initializing the client exposes:

  • .models - Model listing, metadata retrieval, and download
  • .servers - Server creation and control

.models

The .models accessor provides methods for listing and downloading models.

MethodReturn TypeDescription
.list()List[ModelSummary]List all available and downloaded models
.download('<MODEL>/<QUANT>')NoneDownload a quantized model file
.get('<MODEL>')ModelSummaryFetch model metadata

ModelSummary

Attribute/MethodReturnDescription
.idstrModel ID in format <author>/<model-name>
.namestrModel name
.metadataModelMetadataModel metadata and quantization files

ModelMetadata

Attribute/MethodReturnDescription
.numParametersintNumber of model parameters
.contextWindowSizeintContext window length
.trainedForstrTraining purpose ('sentence-similarity' or 'text-generation')
.descriptionstrModel description
.filesList[ModelQuantization]Available quantization files
.get_quantization('<QUANT>')ModelQuantizationGet metadata for a specific quantization

ModelQuantization

Attribute/MethodReturnDescription
.download()NoneDownload the quantization file
.idstrSHA256 checksum of the model file
.modelFileNamestrFile name on disk
.methodstrQuantization method
.sizeBytesintFile size in bytes
.maxRamUsageintRequired RAM in bytes
.isDownloadedboolDownload status
.localPathstrLocal file path (if downloaded)

Downloading models

Download a quantized model file using one of the following approaches:

model = client.models.get('OpenHermes-2.5-Mistral-7B')
quantization = model.metadata.get_quantization('Q4_K_M')
quantization.download()

.servers

The .servers accessor provides methods for creating, listing, starting, and stopping servers.

MethodReturnDescription
.list()List[Server]List running servers
.match()ServerFind a running server matching configuration
.create()ServerCreate new server configuration
.start('<server-id>')NoneStart the API server
.status('<server-id>')strGet server status
.stop('<server-id>')NoneStop a running server
.delete('<server-id>')NoneRemove server configuration

Creating servers

The .create method creates a new server configuration. By default, creating a server configuration downloads the model file (if it is not already downloaded) and selects a random, unused port for the server. For example:

from anaconda_ai import get_default_client

client = get_default_client()
server = client.servers.create(
  'OpenHermes-2.5-Mistral-7B/Q4_K_M',
)

If a server with the specified configuration is already running, the existing configuration is returned, and no new server is created.

Server configuration parameters

The optional parameters listed below can be passed as dictionaries and used to avoid the automatic download of the model file.

Parameters set to None are omitted from the server launch command and fall back to backend-defined defaults.

ParameterReturnDescription
api_paramsAPIParams or dictParameters for how the server is configured
load_paramsLoadParams or dictParameters for how the model is loaded
infer_paramsInferParams or dictParameters for inference configuration
class APIParams(BaseModel, extra="forbid"):
    host: str = "127.0.0.1"
    port: int = 0            # 0 means find a random unused port
    api_key: str | None = None
    log_disable: bool | None = None
    mmproj: str | None = None
    timeout: int | None = None
    verbose: bool | None = None
    n_gpu_layers: int | None = None
    main_gpu: int | None = None
    metrics: bool | None = None
class LoadParams(BaseModel, extra="forbid"):
    batch_size: int | None = None
    cont_batching: bool | None = None
    ctx_size: int | None = None
    main_gpu: int | None = None
    memory_f32: bool | None = None
    mlock: bool | None = None
    n_gpu_layers: int | None = None
    rope_freq_base: int | None = None
    rope_freq_scale: int | None = None
    seed: int | None = None
    tensor_split: list[int] | None = None
    use_mmap: bool | None = None
    embedding: bool | None = None
class InferParams(BaseModel, extra="forbid"):
    threads: int | None = None
    n_predict: int | None = None
    top_k: int | None = None
    top_p: float | None = None
    min_p: float | None = None
    repeat_last: int | None = None
    repeat_penalty: float | None = None
    temp: float | None = None
    parallel: int | None = None

For example, to create a server configuration that uses a specific port; customizes the context size, number of GPU layers, and temperature; and avoids downloading the model file, your code might look like this:

server = client.servers.create(
  'OpenHermes-2.5-Mistral-7B/Q4_K_M',
  api_params={"main_gpu": 1, "port": 9999},
  load_params={"ctx_size": 512, "n_gpu_layers": 10},
  infer_params={"temp": 0.1},
  download_if_needed=False
)

Managing servers

New servers are not automatically started when their configuration is created. You can start or stop a server using the following methods:

server.start()
server.stop()

Server attributes

AttributeDescription
.urlFull server URL (example: http://127.0.0.1:8000)
.openai_urlURL with /v1 for OpenAI compatibility
.openai_client()Pre-configured OpenAI client
.openai_async_client()Pre-configured Async OpenAI client

Framework integrations

The SDK provides integrations with several popular AI frameworks:

LLM

Install the llm package in your environment:

conda install conda-forge::llm

To list the Anaconda AI models, run:

llm models list -q anaconda

When you invoke a model, llm first ensures that the model has been downloaded, then starts the server using AI Navigator. Standard OpenAI and the SDK’s server parameters are supported. For example:

llm -m 'anaconda:meta-llama/llama-2-7b-chat-hf_Q4_K_M.gguf' -o temperature 0.1 'what is pi?'

To view server parameters, run:

llm models list -q anaconda --options

For more information on using the llm package, see the official documentation.

LangChain

The LangChain integration provides Chat and Embedding classes that automatically manage downloading models and starting servers.

Install langchain-openai in your environment:

pip install langchain-openai

Only pip install packages in your conda environment once all other packages and their dependencies have been installed. For more information on installing pip packages in your conda environment, see Installing pip packages.

Here is a minimal setup example for using LangChain with Anaconda’s models:

from langchain.prompts import ChatPromptTemplate
from anaconda_ai.integrations.langchain import AnacondaQuantizedModelChat, AnacondaQuantizedModelEmbeddings

prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
model = AnacondaQuantizedModelChat(model_name='meta-llama/llama-2-7b-chat-hf_Q4_K_M.gguf')

chain = prompt | model

message = chain.invoke({'topic': 'python'})

The following keyword arguments are supported:

  • api_params: Dict or APIParams class above
  • load_params: Dict or LoadParams class above
  • infer_params: Dict or InferParams class above (excluding AnacondaQuantizedEmbedding)

The SDK’s server parameter classes are supported for working with LangChain with the exception of AnacondaQuantizedEmbedding.

For more information on using the langchain-openai package, see the official documentation.

LlamaIndex

Install the llama-index-llms-openai package:

pip install llama-index-llms-openai

Only pip install packages in your conda environment once all other packages and their dependencies have been installed. For more information on installing pip packages in your conda environment, see Installing pip packages.

Here is a minimal setup example for using LlamaIndex with Anaconda AI Navigator:

from anaconda_ai.integrations.llama_index import AnacondaModel

llm = AnacondaModel(
    model='OpenHermes-2.5-Mistral-7B_q4_k_m'
)

The AnacondaModel class supports the following arguments

ParameterTypeDescriptionDefault
modelstrModel nameRequired
system_promptstrSystem promptNone
temperaturefloatSampling temperature0.1
max_tokensintMax tokens to predictModel default
api_paramsdict or APIParamsServer configurationNone
load_paramsdict or LoadParamsModel loadingNone
infer_paramsdict or InferParamsInference configNone

For more information on using the llama-index-llms-openai package, see the official documentation.

LiteLLM

This provides a CustomLLM provider for use with litellm. But, since litellm does not currently support entrypoints to register the provider, the user must import the module first.

import litellm
import anaconda_ai.integrations.litellm

response = litellm.completion(
    'anaconda/openhermes-2.5-mistral-7b/q4_k_m',
    messages=[{'role': 'user', 'content': 'what is pi?'}]
)

This integration supports litellm.completion() for both standard and streamed completions (stream=True). Most OpenAI-compatible inference parameters are available and behave as expected, with the exception of the n parameter (for multiple completions), which is not currently supported.

You can also configure server behavior using the optional_params argument. This accepts a dictionary with api_params, load_params, and infer_params keys, matching the parameter schemas defined in the SDK:

optional_params = {
    "api_params": {"main_gpu": 1, "port": 9999},
    "load_params": {"ctx_size": 512, "n_gpu_layers": 10},
    "infer_params": {"temp": 0.1},
}

For more information on using the litellm package, see the official documentation.

DSPy

The SDK integrates with DSPy using litellm, allowing you to use quantized local models with any DSPy module that relies on the LM interface.

Install the dspy package:

pip install -U dspy

Only pip install packages in your conda environment once all other packages and their dependencies have been installed. For more information on installing pip packages in your conda environment, see Installing pip packages.

Here is an example of how to use DSPy with Anaconda’s models:

import dspy
import anaconda_ai.integrations.litellm

lm = dspy.LM('anaconda/openhermes-2.5-mistral-7b/q4_k_m')
dspy.configure(lm=lm)

chain = dspy.ChainOfThought("question -> answer")
chain(question="Who are you?")

For more information on using the dspy package, see the official documentation.

Panel

Use Panel’s ChatInterface callback to build a chatbot that uses one of Anaconda’s models by serving it through the SDK.

The ChatInterface callback requires panel, httpx, and numpy to be installed in your environment:

conda install panel httpx numpy

Here’s an example Panel chatbot application:

import panel as pn
from anaconda_ai.integrations.panel import AnacondaModelHandler

pn.extension('echarts', 'tabulator', 'terminal')

llm = AnacondaModelHandler('TinyLlama/TinyLlama-1.1B-Chat-v1.0_Q4_K_M.gguf', display_throughput=True)

chat = pn.chat.ChatInterface(
    callback=llm.callback,
    show_button_name=False)

chat.send(
    "I am your assistant. How can I help you?",
    user=llm.model_id, avatar=llm.avatar, respond=False
)
chat.servable()

AnacondaModelHandler supports the following keyword arguments:

ParameterDescription
display_throughputShow a speed dial next to the response. Default is False
system_messageDefault system message applied to all responses
client_optionsOptional dict passed as keyword arguments to chat.completions.create
api_paramsOptional dict or APIParams object
load_paramsOptional dict or LoadParams object
infer_paramsOptional dict or InferParams object

For more information on using Panel, see the official documentation.