pythonbeginner
Google Gemini Vision API in Python
Analyse images and PDFs using Google Gemini's multimodal vision API with the Python SDK.
pythonPress ⌘/Ctrl + Shift + C to copy
import google.generativeai as genai
from pathlib import Path
genai.configure(api_key='YOUR_GEMINI_API_KEY')
model = genai.GenerativeModel('gemini-1.5-flash')
# Analyse an image
image_bytes = Path('chart.png').read_bytes()
image_part = {'mime_type': 'image/png', 'data': image_bytes}
response = model.generate_content([
image_part,
'Describe this chart in detail. What trends do you see?',
])
print(response.text)
# Count tokens
tokens = model.count_tokens([image_part, 'Describe this chart'])
print(f'Token count: {tokens.total_tokens}')Use Cases
- image analysis
- chart understanding
- document extraction
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonintermediate
Analyze Images with GPT Vision API
Send images to GPT-4o for description, analysis, and visual Q&A.
Best for: Image analysis
#ai#vision
typescriptintermediate
OpenAI Vision API Image Analysis
Analyze images using GPT-4o vision capabilities with base64 and URL inputs.
Best for: image captioning
#ai#vision
pythonadvanced
Multimodal RAG with Images and Text
Build a multimodal RAG pipeline that retrieves and answers questions about image+text documents.
Best for: visual document Q&A
#multimodal#rag
typescriptintermediate
Google Gemini API Integration
Call the Google Gemini API for text generation with streaming, safety settings, and system prompts.
Best for: Google AI-powered code generation
#gemini#google-ai