pythonadvanced

dbt Python Model with pandas

Write a dbt Python model that runs on Databricks/Snowpark to transform DataFrames in the warehouse.

python
import pandas as pd

def model(dbt, session):
    dbt.config(
        materialized='table',
        packages=['pandas'],
    )

    df: pd.DataFrame = dbt.ref('stg_orders').to_pandas()

    df['order_month'] = df['created_at'].dt.to_period('M').astype(str)
    df['revenue']     = df['price'] * df['qty']

    summary = (
        df.groupby(['order_month','region'])
        .agg(total_revenue=('revenue','sum'), order_count=('order_id','nunique'))
        .reset_index()
    )
    return summary

Use Cases

  • dbt Python models
  • warehouse transformations
  • Snowpark/Databricks ETL

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.