LLM plugin for models hosted by LiteLLM proxy server.
LiteLLM is a self-hosted proxy server that provides a unified interface to 100+ LLMs including OpenAI, Anthropic, Cohere, Replicate, PaLM, and more.
First, install the LLM command-line utility.
Now install this plugin in the same environment as LLM:
llm install llm-litellm
To update the plugin to the latest version:
llm install --upgrade llm-litellm
Or using pip directly:
pip install --upgrade llm-litellm
For development or to install from source:
pip install -e .
First, you need to have a LiteLLM server running. You can set it up by:
pip install litellm[proxy]
litellm --model gpt-3.5-turbo
# This starts the server on http://localhost:4000
Or use Docker:
docker run -p 4000:4000 -e OPENAI_API_KEY=your-key ghcr.io/berriai/litellm:main-latest --model gpt-3.5-turbo
Set the LITELLM_URL
environment variable to point to your LiteLLM server:
export LITELLM_URL=http://localhost:4000
If your LiteLLM server requires authentication, set the API key:
llm keys set litellm
# Enter your LiteLLM API key when prompted
Once configured, you can use any model supported by your LiteLLM server.
To see all available models:
llm models list
You should see models prefixed with litellm:
:
litellm: gpt-3.5-turbo
litellm: gpt-4
litellm: claude-3-sonnet
...
You can also use the plugin-specific command:
llm litellm models
# Use a specific model
llm -m litellm/gpt-3.5-turbo "Hello, world!"
# Use with different models
llm -m litellm/claude-3-sonnet "Explain quantum computing"
llm -m litellm/gpt-4 "Write a short story"
# Set temperature and other parameters
llm -m litellm/gpt-3.5-turbo -o temperature 0.9 -o max_tokens 500 "Be creative!"
# Use with streaming
llm -m litellm/gpt-4 "Write a long explanation" --stream
# Conversation mode
llm -m litellm/claude-3-sonnet -c "Let's discuss Python programming"
You can set shorter aliases for frequently used models:
llm aliases set gpt4 litellm/gpt-4
llm aliases set claude litellm/claude-3-sonnet
Now you can use:
llm -m gpt4 "Hello!"
llm -m claude "Explain this code" < script.py
llm litellm status
This will check if your LiteLLM server is running and accessible.
# Human-readable format
llm litellm models
# JSON format
llm litellm models --json
The plugin supports any model that your LiteLLM server is configured to handle. Common models include:
- OpenAI:
gpt-4
,gpt-3.5-turbo
,gpt-4-turbo
- Anthropic:
claude-3-opus
,claude-3-sonnet
,claude-3-haiku
- Google:
gemini-pro
,gemini-pro-vision
- Cohere:
command-r
,command-r-plus
- Meta:
llama-2-70b
,llama-2-13b
- Mistral:
mistral-7b
,mistral-medium
- And many more...
Check your LiteLLM server configuration for the exact models available.
The plugin supports all standard LLM options:
temperature
: Controls randomness (0.0-2.0)max_tokens
: Maximum tokens to generatetop_p
: Top-p sampling parameterfrequency_penalty
: Frequency penalty (-2.0 to 2.0)presence_penalty
: Presence penalty (-2.0 to 2.0)
Example:
llm -m litellm/gpt-3.5-turbo \
-o temperature 0.7 \
-o max_tokens 1000 \
-o top_p 0.9 \
"Generate creative content"
-
"LITELLM_URL environment variable is required"
- Make sure you've set the
LITELLM_URL
environment variable - Verify your LiteLLM server is running and accessible
- Make sure you've set the
-
No models showing up
- Check server status:
llm litellm status
- Verify the URL is correct (should include protocol: http:// or https://)
- Test the server directly:
curl http://localhost:4000/health
- Check server status:
-
Connection errors
- Check that your LiteLLM server is running
- Verify firewall settings allow connections to the server
- Test with:
curl http://localhost:4000/v1/models
-
Authentication errors
- If your LiteLLM server requires authentication, set the key:
llm keys set litellm
- Check your LiteLLM server configuration for authentication requirements
- If your LiteLLM server requires authentication, set the key:
-
Model not found
- Verify the model is configured in your LiteLLM server
- Check available models:
llm litellm models
- Ensure the model name matches exactly
For debugging, you can check what models are available:
# Check server status
llm litellm status
# List all models
llm litellm models --json
# Test a simple query
llm -m litellm/gpt-3.5-turbo "test" -v
export OPENAI_API_KEY=your-key
litellm --model gpt-3.5-turbo --model gpt-4
export LITELLM_URL=http://localhost:4000
llm -m litellm/gpt-3.5-turbo "Hello!"
export ANTHROPIC_API_KEY=your-key
litellm --model claude-3-sonnet --model claude-3-haiku
export LITELLM_URL=http://localhost:4000
llm -m litellm/claude-3-sonnet "Hello!"
export OPENAI_API_KEY=your-openai-key
export ANTHROPIC_API_KEY=your-anthropic-key
litellm --model gpt-4 --model claude-3-sonnet --model gemini-pro
export LITELLM_URL=http://localhost:4000
llm -m litellm/gpt-4 "Compare yourself to Claude"
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd llm-litellm
python3 -m venv venv
source venv/bin/activate
Install the plugin in development mode:
pip install -e '.[test]'
To run the tests:
pytest
Apache License 2.0