pythonintermediate

Stream Large SQL Query in Chunks

Read millions of rows from SQL in memory-safe chunks using pandas read_sql with chunksize.

python
import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('postgresql://user:pass@localhost/db')

results = []
for chunk in pd.read_sql(
    "SELECT * FROM events WHERE created_at >= '2024-01-01'",
    con=engine,
    chunksize=50_000,
):
    chunk['hour'] = chunk['created_at'].dt.hour
    results.append(chunk[['event_id','user_id','hour']])

df = pd.concat(results, ignore_index=True)
print(df.shape)

Use Cases

  • large table extraction
  • memory-safe ETL
  • incremental processing

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.