AI Navigator SDK

The anaconda-ai package exposes both a CLI and a Python SDK for working with Anaconda AI Navigator. With it, you can:

Download and manage quantized LLMs
Launch and manage inference servers
Integrate with popular frameworks like LangChain, LlamaIndex, DSPy, and more

Installation

Install the package into your conda environment:

conda install conda-forge::anaconda-ai

Configuration

Configuration settings are defined in ~/.anaconda/config.toml under [plugin.ai].

Configurable parameters

Parameter	Environment Variable	Description	Default
`stop_server_on_exit`	`ANACONDA_AI_STOP_SERVER_ON_EXIT`	Automatically stop servers when Python interpreter exits	`true`

[plugin.ai]
stop_server_on_exit = true

Model reference format

Quantized models follow this reference format:

<AUTHOR>/<MODEL>/<QUANT>.<EXT>

<AUTHOR>: Model’s publisher name (optional)
<MODEL>: Model name
<QUANT>: Quantization method (Q4_K_M, Q5_K_M, Q6_K, Q8_0)
<EXT>: File extension, usually .gguf (optional)

The model name and quantization method must be separated by either / or _.

Example model references

AI Navigator CLI

The CLI provides functionality for managing models and servers using the anaconda ai command.

Subcommand	Description
`create-table`	Create a table in the vector database
`download`	Download a model
`drop-table`	Drop a table from the vector database
`launch`	Launch an inference server for a model
`launch-vectordb`	Start a vector database
`list-tables`	List all tables in the vector database
`models`	List model information
`remove`	Remove a downloaded model
`servers`	List running servers
`stop`	Stop a running server
`stop-vectordb`	Stop the vector database

Append --help to any subcommand for more information.

The AI Navigator SDK provides a Python interface for managing models and inference servers. Use it to list and download quantized models, configure and launch API servers, and integrate directly into your application’s workflows.

Client initialization

from anaconda_ai import get_default_client
client = get_default_client()

Initializing the client exposes:

.models - Model listing, metadata retrieval, and download
.servers - Server creation and control

`.models`

The .models accessor provides methods for listing and downloading models.

Method	Return Type	Description
`.list()`	`List[ModelSummary]`	List all available and downloaded models
`.download('<MODEL>/<QUANT>')`	`None`	Download a quantized model file
`.get('<MODEL>')`	`ModelSummary`	Fetch model metadata

ModelSummary

Attribute/Method	Return	Description
`.id`	`str`	Model ID in format `<author>/<model-name>`
`.name`	`str`	Model name
`.metadata`	`ModelMetadata`	Model metadata and quantization files

ModelMetadata

Attribute/Method	Return	Description
`.numParameters`	`int`	Number of model parameters
`.contextWindowSize`	`int`	Context window length
`.trainedFor`	`str`	Training purpose (`'sentence-similarity'` or `'text-generation'`)
`.description`	`str`	Model description
`.files`	`List[ModelQuantization]`	Available quantization files
`.get_quantization('<QUANT>')`	`ModelQuantization`	Get metadata for a specific quantization

ModelQuantization

Attribute/Method	Return	Description
`.download()`	`None`	Download the quantization file
`.id`	`str`	SHA256 checksum of the model file
`.modelFileName`	`str`	File name on disk
`.method`	`str`	Quantization method
`.sizeBytes`	`int`	File size in bytes
`.maxRamUsage`	`int`	Required RAM in bytes
`.isDownloaded`	`bool`	Download status
`.localPath`	`str`	Local file path (if downloaded)

Downloading models

Download a quantized model file using one of the following approaches:

model = client.models.get('OpenHermes-2.5-Mistral-7B')
quantization = model.metadata.get_quantization('Q4_K_M')
quantization.download()

`.servers`

The .servers accessor provides methods for creating, listing, starting, and stopping servers.

Method	Return	Description
`.list()`	`List[Server]`	List running servers
`.match()`	`Server`	Find a running server matching configuration
`.create()`	`Server`	Create new server configuration
`.start('<server-id>')`	`None`	Start the API server
`.status('<server-id>')`	`str`	Get server status
`.stop('<server-id>')`	`None`	Stop a running server
`.delete('<server-id>')`	`None`	Remove server configuration

Creating servers

The .create method creates a new server configuration. By default, creating a server configuration downloads the model file (if it is not already downloaded) and selects a random, unused port for the server. For example:

from anaconda_ai import get_default_client

client = get_default_client()
server = client.servers.create(
  'OpenHermes-2.5-Mistral-7B/Q4_K_M',
)

If a server with the specified configuration is already running, the existing configuration is returned, and no new server is created.

Server configuration parameters

The optional parameters listed below can be passed as dictionaries and used to avoid the automatic download of the model file.

Parameters set to None are omitted from the server launch command and fall back to backend-defined defaults.

Parameter	Return	Description
`api_params`	`APIParams` or `dict`	Parameters for how the server is configured
`load_params`	`LoadParams` or `dict`	Parameters for how the model is loaded
`infer_params`	`InferParams` or `dict`	Parameters for inference configuration

class APIParams(BaseModel, extra="forbid"):
    host: str = "127.0.0.1"
    port: int = 0            # 0 means find a random unused port
    api_key: str | None = None
    log_disable: bool | None = None
    mmproj: str | None = None
    timeout: int | None = None
    verbose: bool | None = None
    n_gpu_layers: int | None = None
    main_gpu: int | None = None
    metrics: bool | None = None

class LoadParams(BaseModel, extra="forbid"):
    batch_size: int | None = None
    cont_batching: bool | None = None
    ctx_size: int | None = None
    main_gpu: int | None = None
    memory_f32: bool | None = None
    mlock: bool | None = None
    n_gpu_layers: int | None = None
    rope_freq_base: int | None = None
    rope_freq_scale: int | None = None
    seed: int | None = None
    tensor_split: list[int] | None = None
    use_mmap: bool | None = None
    embedding: bool | None = None

class InferParams(BaseModel, extra="forbid"):
    threads: int | None = None
    n_predict: int | None = None
    top_k: int | None = None
    top_p: float | None = None
    min_p: float | None = None
    repeat_last: int | None = None
    repeat_penalty: float | None = None
    temp: float | None = None
    parallel: int | None = None

For example, to create a server configuration that uses a specific port; customizes the context size, number of GPU layers, and temperature; and avoids downloading the model file, your code might look like this:

server = client.servers.create(
  'OpenHermes-2.5-Mistral-7B/Q4_K_M',
  api_params={"main_gpu": 1, "port": 9999},
  load_params={"ctx_size": 512, "n_gpu_layers": 10},
  infer_params={"temp": 0.1},
  download_if_needed=False
)

Managing servers

New servers are not automatically started when their configuration is created. You can start or stop a server using the following methods:

server.start()
server.stop()

Server attributes

Attribute	Description
`.url`	Full server URL (example: `http://127.0.0.1:8000`)
`.openai_url`	URL with `/v1` for OpenAI compatibility
`.openai_client()`	Pre-configured OpenAI client
`.openai_async_client()`	Pre-configured Async OpenAI client

Framework integrations

The SDK provides integrations with several popular AI frameworks:

LLM

Install the llm package in your environment:

conda install conda-forge::llm

To list the Anaconda AI models, run:

llm models list -q anaconda

When you invoke a model, llm first ensures that the model has been downloaded, then starts the server using AI Navigator. Standard OpenAI and the SDK’s server parameters are supported. For example:

llm -m 'anaconda:meta-llama/llama-2-7b-chat-hf_Q4_K_M.gguf' -o temperature 0.1 'what is pi?'

To view server parameters, run:

llm models list -q anaconda --options

For more information on using the llm package, see the official documentation.

LangChain

The LangChain integration provides Chat and Embedding classes that automatically manage downloading models and starting servers. Install langchain-openai in your environment:

pip install langchain-openai

Only pip install packages in your conda environment once all other packages and their dependencies have been installed. For more information on installing pip packages in your conda environment, see Installing pip packages.

Here is a minimal setup example for using LangChain with Anaconda’s models:

from langchain.prompts import ChatPromptTemplate
from anaconda_ai.integrations.langchain import AnacondaQuantizedModelChat, AnacondaQuantizedModelEmbeddings

prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
model = AnacondaQuantizedModelChat(model_name='meta-llama/llama-2-7b-chat-hf_Q4_K_M.gguf')

chain = prompt | model

message = chain.invoke({'topic': 'python'})

The following keyword arguments are supported:

api_params: Dict or APIParams class above
load_params: Dict or LoadParams class above
infer_params: Dict or InferParams class above (excluding AnacondaQuantizedEmbedding)

The SDK’s server parameter classes are supported for working with LangChain with the exception of AnacondaQuantizedEmbedding. For more information on using the langchain-openai package, see the official documentation.

LlamaIndex

Install the llama-index-llms-openai package:

pip install llama-index-llms-openai

Here is a minimal setup example for using LlamaIndex with Anaconda AI Navigator:

from anaconda_ai.integrations.llama_index import AnacondaModel

llm = AnacondaModel(
    model='OpenHermes-2.5-Mistral-7B_q4_k_m'
)

The AnacondaModel class supports the following arguments

Parameter	Type	Description	Default
`model`	`str`	Model name	Required
`system_prompt`	`str`	System prompt	`None`
`temperature`	`float`	Sampling temperature	`0.1`
`max_tokens`	`int`	Max tokens to predict	Model default
`api_params`	`dict` or `APIParams`	Server configuration	`None`
`load_params`	`dict` or `LoadParams`	Model loading	`None`
`infer_params`	`dict` or `InferParams`	Inference config	`None`

For more information on using the llama-index-llms-openai package, see the official documentation.

LiteLLM

This provides a CustomLLM provider for use with litellm. But, since litellm does not currently support entrypoints to register the provider, the user must import the module first.

import litellm
import anaconda_ai.integrations.litellm

response = litellm.completion(
    'anaconda/openhermes-2.5-mistral-7b/q4_k_m',
    messages=[{'role': 'user', 'content': 'what is pi?'}]
)

This integration supports litellm.completion() for both standard and streamed completions (stream=True). Most OpenAI-compatible inference parameters are available and behave as expected, with the exception of the n parameter (for multiple completions), which is not currently supported. You can also configure server behavior using the optional_params argument. This accepts a dictionary with api_params, load_params, and infer_params keys, matching the parameter schemas defined in the SDK:

optional_params = {
    "api_params": {"main_gpu": 1, "port": 9999},
    "load_params": {"ctx_size": 512, "n_gpu_layers": 10},
    "infer_params": {"temp": 0.1},
}

For more information on using the litellm package, see the official documentation.

DSPy

The SDK integrates with DSPy using litellm, allowing you to use quantized local models with any DSPy module that relies on the LM interface. Install the dspy package:

pip install -U dspy

Here is an example of how to use DSPy with Anaconda’s models:

import dspy
import anaconda_ai.integrations.litellm

lm = dspy.LM('anaconda/openhermes-2.5-mistral-7b/q4_k_m')
dspy.configure(lm=lm)

chain = dspy.ChainOfThought("question -> answer")
chain(question="Who are you?")

For more information on using the dspy package, see the official documentation.

Panel

Use Panel’s ChatInterface callback to build a chatbot that uses one of Anaconda’s models by serving it through the SDK.

The ChatInterface callback requires panel, httpx, and numpy to be installed in your environment:

conda install panel httpx numpy

Here’s an example Panel chatbot application:

import panel as pn
from anaconda_ai.integrations.panel import AnacondaModelHandler

pn.extension('echarts', 'tabulator', 'terminal')

llm = AnacondaModelHandler('TinyLlama/TinyLlama-1.1B-Chat-v1.0_Q4_K_M.gguf', display_throughput=True)

chat = pn.chat.ChatInterface(
    callback=llm.callback,
    show_button_name=False)

chat.send(
    "I am your assistant. How can I help you?",
    user=llm.model_id, avatar=llm.avatar, respond=False
)
chat.servable()

AnacondaModelHandler supports the following keyword arguments:

Parameter	Description
`display_throughput`	Show a speed dial next to the response. Default is False
`system_message`	Default system message applied to all responses
`client_options`	Optional dict passed as keyword arguments to `chat.completions.create`
`api_params`	Optional dict or `APIParams` object
`load_params`	Optional dict or `LoadParams` object
`infer_params`	Optional dict or `InferParams` object

For more information on using Panel, see the official documentation.

Tools

​Installation

​Configuration

​Configurable parameters

​Model reference format

​AI Navigator CLI

​AI Navigator SDK

​Client initialization

​.models

​Downloading models

​.servers

​Creating servers

​Server configuration parameters

​Managing servers

​Server attributes

​Framework integrations

​LLM

​LangChain

​LlamaIndex

​LiteLLM

​DSPy

​Panel

Installation

Configuration

Configurable parameters

Model reference format

AI Navigator CLI

AI Navigator SDK

Client initialization

`.models`

Downloading models

`.servers`

Creating servers

Server configuration parameters

Managing servers

Server attributes

Framework integrations

LLM

LangChain

LlamaIndex

LiteLLM

DSPy

Panel