pythonbeginner

DuckDB In-Memory Analytics

Run fast analytical SQL on pandas DataFrames or Parquet files without a server using DuckDB.

python
import duckdb
import pandas as pd
import numpy as np

df = pd.DataFrame({'date': pd.date_range('2024-01-01', periods=365, freq='D'),'revenue': np.random.randint(1000, 5000, 365),'region': np.random.choice(['North','South','East'], 365)})

result = duckdb.query("""
    SELECT
        region,
        date_trunc('month', date) AS month,
        SUM(revenue)              AS total_revenue,
        AVG(revenue)              AS avg_revenue
    FROM df
    GROUP BY 1, 2
    ORDER BY 1, 2
""").df()
print(result.head(12))

Use Cases

  • serverless analytics
  • Parquet querying
  • in-process SQL

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.