Snowflake MERGE with Slowly Changing Dim
Implement SCD Type 2 in Snowflake using MERGE to track historical changes in dimension tables.
-- SCD Type 2: track full history of changes
MERGE INTO dim_customers AS target
USING (
SELECT
customer_id,
name,
email,
segment,
region,
MD5(CONCAT(name, email, segment, region)) AS row_hash,
CURRENT_TIMESTAMP() AS loaded_at
FROM stg_customers
) AS source
ON target.customer_id = source.customer_id
AND target.is_current = TRUE
-- Update: expire the old record if attributes changed
WHEN MATCHED AND target.row_hash != source.row_hash THEN
UPDATE SET
target.is_current = FALSE,
target.valid_to = source.loaded_at,
target.updated_at = source.loaded_at
-- Insert: new customer or new version
WHEN NOT MATCHED THEN
INSERT (
customer_id, name, email, segment, region,
row_hash, is_current, valid_from, valid_to,
created_at, updated_at
)
VALUES (
source.customer_id, source.name, source.email,
source.segment, source.region, source.row_hash,
TRUE, source.loaded_at, '9999-12-31',
source.loaded_at, source.loaded_at
);
-- After merge, insert new versions for changed records
INSERT INTO dim_customers
SELECT
s.customer_id, s.name, s.email, s.segment, s.region,
s.row_hash, TRUE, CURRENT_TIMESTAMP(), '9999-12-31',
CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP()
FROM stg_customers s
JOIN dim_customers d
ON s.customer_id = d.customer_id
AND d.is_current = FALSE
AND d.updated_at = (
SELECT MAX(updated_at) FROM dim_customers
WHERE customer_id = s.customer_id
)
WHERE NOT EXISTS (
SELECT 1 FROM dim_customers
WHERE customer_id = s.customer_id AND is_current = TRUE
);Use Cases
- Tracking full history of dimension changes
- Data warehouse merge patterns for Snowflake
- SCD Type 2 implementation for analytics
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
dbt Incremental Model Pattern
Build efficient dbt incremental models that process only new or changed data instead of full refreshes.
Best for: Efficient data warehouse builds processing only deltas
Python ETL Pipeline Example
Complete extract-transform-load pipeline with error handling, logging, and incremental processing.
Best for: Automating data ingestion from CSV to warehouse
Pandas Merge and Join Examples
Combine DataFrames using merge, join, and concat with different join types and key handling.
Best for: Combining data from multiple sources
Slowly Changing Dimension Type 2 in SQL
Implement SCD Type 2 to track historical changes in dimension tables with effective date ranges.
Best for: Tracking customer attribute changes over time