pythonintermediate
HuggingFace Text Generation with Streaming
Run local text generation with HuggingFace models and stream output token-by-token to the console.
pythonPress ⌘/Ctrl + Shift + C to copy
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
import torch
from threading import Thread
model_name = 'Qwen/Qwen2.5-0.5B-Instruct'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
prompt = 'Explain what a Python decorator is in simple terms:'
inputs = tokenizer(prompt, return_tensors='pt')
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
gen_kwargs = {**inputs, 'streamer': streamer, 'max_new_tokens': 200, 'do_sample': True, 'temperature': 0.7}
t = Thread(target=model.generate, kwargs=gen_kwargs)
t.start()
print(prompt, end='')
for token in streamer:
print(token, end='', flush=True)
print()Use Cases
- local LLM
- streaming generation
- open-source models
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
typescriptintermediate
OpenAI Chat Completion with Streaming
Stream GPT responses token-by-token using the OpenAI SDK with async iteration.
Best for: chatbot UI
#openai#streaming
typescriptintermediate
Next.js AI Streaming Route Handler
Stream OpenAI responses from a Next.js App Router route handler using the Vercel AI SDK.
Best for: AI chatbot backend
#nextjs#openai
pythonintermediate
Hugging Face Inference API
Run ML models via the Hugging Face Inference API for text generation, classification, and embeddings.
Best for: Text classification
#huggingface#inference
typescriptintermediate
Google Gemini API Integration
Call the Google Gemini API for text generation with streaming, safety settings, and system prompts.
Best for: Google AI-powered code generation
#gemini#google-ai