pythonintermediate
CLIP Image-Text Similarity Search
Search images by text description using OpenAI's CLIP model for zero-shot visual semantic search.
pythonPress ⌘/Ctrl + Shift + C to copy
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import torch
import requests
from io import BytesIO
model = CLIPModel.from_pretrained('openai/clip-vit-base-patch32')
processor = CLIPProcessor.from_pretrained('openai/clip-vit-base-patch32')
# Dummy images (replace with real images)
images = [Image.new('RGB', (224,224), color=c) for c in [(255,0,0),(0,255,0),(0,0,255)]]
image_labels = ['red image', 'green image', 'blue image']
texts = ['a red coloured picture', 'a green coloured picture', 'something blue']
inputs = processor(text=texts, images=images, return_tensors='pt', padding=True)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits_per_image # shape: [n_images, n_texts]
similarities = logits.softmax(dim=1)
for i, label in enumerate(image_labels):
best_text = texts[similarities[i].argmax()]
print(f'{label} -> best match: {best_text!r} ({similarities[i].max():.2%})')Use Cases
- visual search
- image-text matching
- zero-shot classification
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonintermediate
Analyze Images with GPT Vision API
Send images to GPT-4o for description, analysis, and visual Q&A.
Best for: Image analysis
#ai#vision
typescriptintermediate
OpenAI Vision API Image Analysis
Analyze images using GPT-4o vision capabilities with base64 and URL inputs.
Best for: image captioning
#ai#vision
pythonbeginner
Google Gemini Vision API in Python
Analyse images and PDFs using Google Gemini's multimodal vision API with the Python SDK.
Best for: image analysis
#gemini#vision
pythonbeginner
Zero-Shot Text Classification
Classify text into custom categories using zero-shot NLI models without training data.
Best for: content categorization
#huggingface#zero-shot