pythonintermediate
LLM Prompt Testing Framework
Write automated tests for LLM prompts using Python assertions to detect regressions.
pythonPress ⌘/Ctrl + Shift + C to copy
from openai import OpenAI
import json
client = OpenAI()
def run_prompt(prompt: str, model: str = 'gpt-4o-mini') -> str:
resp = client.chat.completions.create(
model=model,
messages=[{'role':'user','content':prompt}],
temperature=0,
response_format={'type':'json_object'},
)
return resp.choices[0].message.content
tests = [
{
'name': 'Sentiment positive',
'prompt': 'Classify sentiment of "I love this product!" Return JSON: {"sentiment": "positive"|"negative"|"neutral"}',
'assert': lambda r: json.loads(r)['sentiment'] == 'positive',
},
{
'name': 'Sentiment negative',
'prompt': 'Classify sentiment of "This is terrible." Return JSON: {"sentiment": "positive"|"negative"|"neutral"}',
'assert': lambda r: json.loads(r)['sentiment'] == 'negative',
},
]
for t in tests:
result = run_prompt(t['prompt'])
passed = t['assert'](result)
print(f"{'PASS' if passed else 'FAIL'}: {t['name']} -> {result}")Use Cases
- prompt regression testing
- LLM CI
- quality assurance
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonintermediate
LLM Testing with DeepEval
Write unit tests for LLM outputs using the DeepEval framework for correctness and hallucination detection.
Best for: LLM testing
#deepeval#testing
typescriptbeginner
Few-Shot Prompt Template
Build structured few-shot prompts with examples, system instructions, and output format constraints.
Best for: Consistent AI outputs
#prompts#few-shot
typescriptadvanced
LLM Output Evaluation and Scoring
Evaluate LLM outputs programmatically with scoring rubrics for quality, relevance, and safety.
Best for: prompt testing
#ai#evaluation
pythonbeginner
Jinja2 Prompt Templates for AI
Manage complex AI prompt templates with Jinja2 for reusable, parameterised prompt generation.
Best for: prompt management
#jinja2#prompts