Usage
Description
Launches an OpenAI-compatible inference server for a quantized model. If the model has not been downloaded, it is downloaded automatically before the server starts. By default, the server runs in the foreground and is stopped and removed when you pressCtrl+C. Use --detach to leave the server running in the background after the command exits.
Arguments
Options
Server options
Additional server options can be appended as--key=value pairs or --key boolean flags. These options are passed directly to the backend server.
Server options are backend-specific. The options below apply to the
ai-navigator and anaconda-desktop backends and map to llama-server parameters. The ai-catalyst backend may support different options.