LiteLLM - Anaconda

This provides a CustomLLM provider for use with litellm. Since litellm does not currently support entrypoints to register the provider, you must import the module before use.

import litellm
import anaconda_ai.integrations.litellm

response = litellm.completion(
    'anaconda/openhermes-2.5-mistral-7b/q4_k_m',
    messages=[{'role': 'user', 'content': 'what is pi?'}]
)

To use an already running server, use anaconda/server/<server-name> as the model:

response = litellm.completion(
    'anaconda/server/my-server',
    messages=[{'role': 'user', 'content': 'what is pi?'}]
)

This integration supports litellm.completion() for both standard and streamed completions (stream=True). Most OpenAI-compatible inference parameters are available and behave as expected, with the exception of the n parameter (for multiple completions), which is not currently supported. You can also configure server behavior using the optional_params argument. Pass server configuration options under the "server" key:

response = litellm.completion(
    'anaconda/openhermes-2.5-mistral-7b/q4_k_m',
    messages=[{'role': 'user', 'content': 'what is pi?'}],
    optional_params={
        "server": {
            "main_gpu": 1,
            "port": 9999,
            "ctx_size": 4096,
            "n_gpu_layers": 20,
            "temp": 0.7
        }
    }
)

For more information on using the litellm package, see the official documentation.