pythonadvanced
Prepare Fine-Tuning Dataset for OpenAI
Build, validate, and upload a JSONL fine-tuning dataset for OpenAI GPT fine-tuning.
pythonPress ⌘/Ctrl + Shift + C to copy
import json
from openai import OpenAI
from pathlib import Path
# Create training examples
examples = [
{'messages': [{'role':'system','content':'You are a SQL expert.'}, {'role':'user','content':'How do I select distinct values?'}, {'role':'assistant','content':'Use SELECT DISTINCT column FROM table;'}]},
{'messages': [{'role':'system','content':'You are a SQL expert.'}, {'role':'user','content':'How do I count rows?'}, {'role':'assistant','content':'Use SELECT COUNT(*) FROM table;'}]},
]
output = Path('finetune_train.jsonl')
output.write_text('\n'.join(json.dumps(ex) for ex in examples))
client = OpenAI()
with open(output, 'rb') as f:
file_obj = client.files.create(file=f, purpose='fine-tune')
job = client.fine_tuning.jobs.create(training_file=file_obj.id, model='gpt-4o-mini-2024-07-18')
print(f'Fine-tune job: {job.id}, status: {job.status}')Use Cases
- model customization
- domain adaptation
- task-specific tuning
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonadvanced
Prepare a Fine-Tuning Dataset for OpenAI
Format, validate, and upload training data for OpenAI model fine-tuning.
Best for: Model customization
#ai#fine-tuning
typescriptintermediate
OpenAI Chat Completion with Streaming
Stream GPT responses token-by-token using the OpenAI SDK with async iteration.
Best for: chatbot UI
#openai#streaming
typescriptbeginner
Generate Text Embeddings with OpenAI
Create vector embeddings for semantic search and similarity matching using text-embedding-3-small.
Best for: semantic search
#openai#embeddings
typescriptadvanced
RAG Pipeline (Retrieve + Augment + Generate)
Minimal RAG implementation: embed a query, retrieve top-k chunks, inject into prompt.
Best for: document Q&A
#rag#embeddings