typescriptadvanced
AI Model Router for Cost Optimization
Route prompts to different LLM models based on complexity to optimize cost and response quality.
typescriptPress ⌘/Ctrl + Shift + C to copy
import OpenAI from 'openai';
const openai = new OpenAI();
type ModelTier = 'fast' | 'balanced' | 'powerful';
const MODEL_MAP: Record<ModelTier, string> = {
fast: 'gpt-4o-mini',
balanced: 'gpt-4o',
powerful: 'gpt-4o',
};
function classifyComplexity(prompt: string): ModelTier {
const wordCount = prompt.split(/\s+/).length;
const hasCode = /```|function |class |import /.test(prompt);
const hasAnalysis = /analyze|compare|evaluate|design/i.test(prompt);
if (wordCount < 50 && !hasCode && !hasAnalysis) return 'fast';
if (hasCode || hasAnalysis) return 'powerful';
return 'balanced';
}
export async function routedCompletion(
prompt: string,
systemPrompt?: string,
): Promise<{ response: string; model: string; tier: ModelTier }> {
const tier = classifyComplexity(prompt);
const model = MODEL_MAP[tier];
const res = await openai.chat.completions.create({
model,
messages: [
...(systemPrompt ? [{ role: 'system' as const, content: systemPrompt }] : []),
{ role: 'user', content: prompt },
],
});
return {
response: res.choices[0].message.content ?? '',
model,
tier,
};
}
const result = await routedCompletion('What is 2+2?');
console.log(`Used ${result.tier} tier (${result.model}): ${result.response}`);Use Cases
- Reducing AI API costs for production apps
- Balancing quality and speed for different queries
- Multi-model architecture for AI applications
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonbeginner
Token Counting and Cost Estimation
Count tokens accurately and estimate API costs before making LLM calls.
Best for: Budget management
#ai#tokens
typescriptadvanced
Semantic Caching Layer for LLM Calls
Cache LLM responses by semantic similarity of prompts to reduce API costs and improve latency.
Best for: Reducing LLM API costs for repeated queries
#caching#embeddings
typescriptintermediate
Batch Embeddings Processing
Generate embeddings for large document sets in batches with rate limiting and progress tracking.
Best for: Indexing large document collections for search
#embeddings#batch-processing
pythonbeginner
Token Counting with tiktoken
Count tokens, split text by token limits, and estimate API costs using the tiktoken library.
Best for: cost estimation
#tiktoken#tokens