dbt Python Model with pandas
Write a dbt Python model that runs on Databricks/Snowpark to transform DataFrames in the warehouse.
import pandas as pd
def model(dbt, session):
dbt.config(
materialized='table',
packages=['pandas'],
)
df: pd.DataFrame = dbt.ref('stg_orders').to_pandas()
df['order_month'] = df['created_at'].dt.to_period('M').astype(str)
df['revenue'] = df['price'] * df['qty']
summary = (
df.groupby(['order_month','region'])
.agg(total_revenue=('revenue','sum'), order_count=('order_id','nunique'))
.reset_index()
)
return summaryUse Cases
- dbt Python models
- warehouse transformations
- Snowpark/Databricks ETL
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
Python ETL Pipeline Example
Complete extract-transform-load pipeline with error handling, logging, and incremental processing.
Best for: Automating data ingestion from CSV to warehouse
Python Batch Processing Script
Process large files in configurable batches with progress tracking, error handling, and resume support.
Best for: Processing large CSV files that don't fit in memory
Database Sync Script in Python
Sync data between two databases with upsert logic, batch processing, and change detection.
Best for: Replicating data between databases
SQL Incremental Load Pattern
Incremental data load using watermark tracking to process only new and updated records efficiently.
Best for: Efficient warehouse loading without full reloads