typescriptintermediate
OpenAI Vision API Image Analysis
Analyze images using GPT-4o vision capabilities with base64 and URL inputs.
typescriptPress β/Ctrl + Shift + C to copy
import OpenAI from 'openai';
import * as fs from 'fs';
const openai = new OpenAI();
async function analyzeImageUrl(imageUrl: string, prompt: string) {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{
role: 'user',
content: [
{ type: 'text', text: prompt },
{ type: 'image_url', image_url: { url: imageUrl, detail: 'high' } },
],
}],
max_tokens: 500,
});
return response.choices[0].message.content;
}
async function analyzeLocalImage(imagePath: string, prompt: string) {
const imageBuffer = fs.readFileSync(imagePath);
const base64Image = imageBuffer.toString('base64');
const mimeType = imagePath.endsWith('.png') ? 'image/png' : 'image/jpeg';
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{
role: 'user',
content: [
{ type: 'text', text: prompt },
{ type: 'image_url', image_url: { url: `data:${mimeType};base64,${base64Image}` } },
],
}],
max_tokens: 500,
});
return response.choices[0].message.content;
}
async function compareImages(imageUrls: string[], prompt: string) {
const imageContent = imageUrls.map((url) => ({
type: 'image_url' as const,
image_url: { url },
}));
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{
role: 'user',
content: [{ type: 'text', text: prompt }, ...imageContent],
}],
max_tokens: 1000,
});
return response.choices[0].message.content;
}Use Cases
- image captioning
- visual QA
- document OCR
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonintermediate
Analyze Images with GPT Vision API
Send images to GPT-4o for description, analysis, and visual Q&A.
Best for: Image analysis
#ai#vision
pythonbeginner
OpenAI GPT-4 Vision Image Analysis
Analyse images from URLs or base64 with GPT-4 Vision for structured visual understanding.
Best for: visual Q&A
#openai#vision
pythonadvanced
Multimodal RAG with Images and Text
Build a multimodal RAG pipeline that retrieves and answers questions about image+text documents.
Best for: visual document Q&A
#multimodal#rag
typescriptbeginner
DALLΒ·E 3 Image Generation
Generate images from a text prompt using the OpenAI DALLΒ·E 3 API and return a URL.
Best for: AI art generation
#openai#dall-e