Hugging Face Inference API
Run ML models via the Hugging Face Inference API for text generation, classification, and embeddings.
import httpx
from typing import Any
HF_TOKEN = "hf_..." # Set via environment variable
API_URL = "https://api-inference.huggingface.co/models"
async def hf_inference(
model: str,
inputs: str | list[str],
parameters: dict[str, Any] | None = None,
) -> Any:
async with httpx.AsyncClient() as client:
response = await client.post(
f"{API_URL}/{model}",
headers={"Authorization": f"Bearer {HF_TOKEN}"},
json={
"inputs": inputs,
**(({"parameters": parameters}) if parameters else {}),
},
timeout=30.0,
)
response.raise_for_status()
return response.json()
# Text generation
# result = await hf_inference(
# "mistralai/Mistral-7B-Instruct-v0.2",
# "Explain quantum computing in one sentence.",
# {"max_new_tokens": 100, "temperature": 0.7},
# )
# Sentiment analysis
# result = await hf_inference(
# "distilbert-base-uncased-finetuned-sst-2-english",
# "I love this product!",
# )Sponsored
Try Hugging Face Pro — Priority API Access
Use Cases
- Text classification
- Sentiment analysis
- Document embeddings
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
OpenAI Chat Completion with Streaming
Stream GPT responses token-by-token using the OpenAI SDK with async iteration.
Generate Text Embeddings with OpenAI
Create vector embeddings for semantic search and similarity matching using text-embedding-3-small.
RAG Pipeline (Retrieve + Augment + Generate)
Minimal RAG implementation: embed a query, retrieve top-k chunks, inject into prompt.
Claude Messages API (Anthropic SDK)
Send messages to Claude using the official Anthropic SDK with system prompt and user turn.