pythonintermediate
Prompt Caching with OpenAI API
Reduce costs by up to 50% using OpenAI's automatic prompt caching for repeated context prefixes.
pythonPress ⌘/Ctrl + Shift + C to copy
from openai import OpenAI
client = OpenAI()
LARGE_SYSTEM_PROMPT = 'You are an expert Python developer. ' * 500 # Large repeated context
def query_with_cache(question: str) -> dict:
resp = client.chat.completions.create(
model='gpt-4o-mini',
messages=[
{'role': 'system', 'content': LARGE_SYSTEM_PROMPT},
{'role': 'user', 'content': question},
],
)
usage = resp.usage
return {
'answer': resp.choices[0].message.content,
'cached_tokens': getattr(usage.prompt_tokens_details, 'cached_tokens', 0),
'total_tokens': usage.total_tokens,
}
# First call — no cache hit
r1 = query_with_cache('What is a list comprehension?')
print(f'Q1 | cached={r1["cached_tokens"]} | total={r1["total_tokens"]}')
# Second call — cache hit on system prompt
r2 = query_with_cache('How do I use decorators?')
print(f'Q2 | cached={r2["cached_tokens"]} | total={r2["total_tokens"]}')Use Cases
- cost reduction
- repeated context
- prompt optimization
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonbeginner
Token Counting with tiktoken
Count tokens, split text by token limits, and estimate API costs using the tiktoken library.
Best for: cost estimation
#tiktoken#tokens
pythonbeginner
Token Counter with Tiktoken
Count tokens and estimate costs for OpenAI API calls using the tiktoken tokenizer library.
Best for: Cost estimation
#tokens#tiktoken
pythonbeginner
Token Counting and Cost Estimation
Count tokens accurately and estimate API costs before making LLM calls.
Best for: Budget management
#ai#tokens
pythonintermediate
Cache Embeddings in Redis
Cache expensive embedding API calls in Redis to avoid redundant computation and reduce costs.
Best for: cost reduction
#redis#embeddings