llm-claude-3-caching

A version of the llm-claude-3 plugin that supports caching. Forked from an older version, this branch does not support images or attachements.

Installation

Install this plugin in the same environment as LLM.

git clone -b prompt-caching https://github.com/irthomasthomas/llm-claude-3-caching.git
cd llm-claude-3-caching
llm install -e .

Usage

First, set an API key for Claude 3:

llm keys set claude
# Paste key here

Run llm models to list the models, and llm models --options to include a list of their options.

Run prompts like this:

llm -m claude-3.5-sonnet-cache 'Fun facts about pelicans' -o cache_prompt 1
llm -m claude-3-opus-cache 'Fun facts about squirrels' -o cache_prompt 1
llm -m claude-3-sonnet-cache 'Fun facts about walruses' -o cache_prompt 1
llm -m claude-3-haiku-cache 'Fun facts about armadillos' -o cache_prompt 1

Prompt Caching

This plugin now supports Anthropic's Prompt Caching feature, which can significantly improve performance and reduce costs for certain types of queries.

How It Works

Prompt Caching allows you to store and reuse context within your prompt. This is especially useful for:

Prompts with many examples
Large amounts of context or background information
Repetitive tasks with consistent instructions
Long multi-turn conversations

The cache has a 5-minute lifetime, refreshed each time the cached content is used.

Usage

To enable Prompt Caching, use the following options:

-o cache_prompt 1: Enables caching for the user prompt.
-o cache_system 1: Enables caching for the system prompt.

Example:

llm -m claude-3-sonnet -o cache_prompt 1 'Analyze this text: [long text here]'
llm -m claude-3-sonnet -o cache_prompt 1 -o cache_system 1 'Analyze this text: [long text here]' --system '[long system prompt here]'

llm -c # continues from cached prompt, if available

Benefits

Based on comprehensive testing across all models:

Model	Cost Reduction Range	Average Reduction
Claude 3 Haiku	78.1% - 99.1%	92.0%
Claude 3 Opus	78.1% - 99.0%	91.9%
Claude 3.5 Sonnet	91.2% - 99.0%	95.2%

Example cost reductions:

Short queries (e.g., "What is the capital of France?")
- Haiku: $0.000016 → $0.000003 (78.1% reduction)
- Opus: $0.000960 → $0.000210 (78.1% reduction)
- Sonnet: $0.000477 → $0.000042 (91.2% reduction)
Detailed queries (e.g., "Tell me about the Eiffel Tower")
- Haiku: $0.000428 → $0.000004 (99.1% reduction)
- Opus: $0.024840 → $0.000240 (99.0% reduction)
- Sonnet: $0.004653 → $0.000048 (99.0% reduction)

Additional benefits:

Reduced latency: Improved response times by over 2x
Improved consistency: Maintained response quality across cached queries
Zero output token costs for cached responses

Caching Behavior

The system checks if the prompt prefix is already cached from a recent query.
If found, it uses the cached version, reducing processing time and costs.
Otherwise, it processes the full prompt and caches the prefix for future use.

Supported Models

Prompt Caching is currently supported on:

Claude 3.5 Sonnet
claude 3.5 Haiku
Claude 3 Haiku
Claude 3 Opus

Performance Tracking

You can monitor cache performance using these fields in the API response:

cache_creation_input_tokens: Number of tokens written to the cache when creating a new entry.
cache_read_input_tokens: Number of tokens retrieved from the cache for this request.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-claude-3
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

llm install -e '.[test]'

To run the tests:

pytest

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
analysis		analysis
tests		tests
.gitignore		.gitignore
CACHING.md		CACHING.md
LICENSE		LICENSE
README.md		README.md
claude-docs.txt		claude-docs.txt
llm_chat_website_proposal.md		llm_chat_website_proposal.md
llm_claude_3_cache.py		llm_claude_3_cache.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llm-claude-3-caching

Installation

Usage

Prompt Caching

How It Works

Usage

Benefits

Caching Behavior

Supported Models

Performance Tracking

Development

About

Uh oh!

Releases

Packages

Languages

License

AnswerDotAI/devin-anthropic-caching-test

Folders and files

Latest commit

History

Repository files navigation

llm-claude-3-caching

Installation

Usage

Prompt Caching

How It Works

Usage

Benefits

Caching Behavior

Supported Models

Performance Tracking

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages