pythonbeginner

Google Gemini Vision API in Python

Analyse images and PDFs using Google Gemini's multimodal vision API with the Python SDK.

python
import google.generativeai as genai
from pathlib import Path

genai.configure(api_key='YOUR_GEMINI_API_KEY')
model = genai.GenerativeModel('gemini-1.5-flash')

# Analyse an image
image_bytes = Path('chart.png').read_bytes()
image_part  = {'mime_type': 'image/png', 'data': image_bytes}

response = model.generate_content([
    image_part,
    'Describe this chart in detail. What trends do you see?',
])
print(response.text)

# Count tokens
tokens = model.count_tokens([image_part, 'Describe this chart'])
print(f'Token count: {tokens.total_tokens}')

Use Cases

  • image analysis
  • chart understanding
  • document extraction

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.