pythonbeginner

Local LLM with Ollama Python Client

Run local open-source models with Ollama and stream responses using the Python API.

python
from ollama import Client
import ollama

# Synchronous call
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3.2', messages=[{'role':'user','content':'What is machine learning?'}])
print(response['message']['content'])

# Streaming
print('\nStreaming response:')
for chunk in ollama.chat(model='llama3.2', messages=[{'role':'user','content':'Write a haiku about Python.'}], stream=True):
    print(chunk['message']['content'], end='', flush=True)
print()

# Generate embeddings locally
embeddings = ollama.embeddings(model='nomic-embed-text', prompt='Python is great for data science.')
print(f'Embedding dim: {len(embeddings["embedding"])}')

Use Cases

  • local AI
  • private inference
  • offline LLM apps

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.