</>SnippetsLabBuild faster with production-ready snippets

pythonbeginner

Google Gemini Vision API in Python

Analyse images and PDFs using Google Gemini's multimodal vision API with the Python SDK.

pythonPress ⌘/Ctrl + Shift + C to copy

import google.generativeai as genai
from pathlib import Path

genai.configure(api_key='YOUR_GEMINI_API_KEY')
model = genai.GenerativeModel('gemini-1.5-flash')

# Analyse an image
image_bytes = Path('chart.png').read_bytes()
image_part  = {'mime_type': 'image/png', 'data': image_bytes}

response = model.generate_content([
    image_part,
    'Describe this chart in detail. What trends do you see?',
])
print(response.text)

# Count tokens
tokens = model.count_tokens([image_part, 'Describe this chart'])
print(f'Token count: {tokens.total_tokens}')

Use Cases

image analysis
chart understanding
document extraction

Tags

#gemini #vision #multimodal #google

Related Snippets

Similar patterns you can reuse in the same workflow.

pythonintermediate

Analyze Images with GPT Vision API

Send images to GPT-4o for description, analysis, and visual Q&A.

Best for: Image analysis

typescriptintermediate

OpenAI Vision API Image Analysis

Analyze images using GPT-4o vision capabilities with base64 and URL inputs.

Best for: image captioning

Multimodal RAG with Images and Text

Build a multimodal RAG pipeline that retrieves and answers questions about image+text documents.

Best for: visual document Q&A

#multimodal#rag

typescriptintermediate

Google Gemini API Integration

Call the Google Gemini API for text generation with streaming, safety settings, and system prompts.

Best for: Google AI-powered code generation

#gemini#google-ai