dbt Source Freshness and Testing
Configure dbt source freshness checks and schema tests to validate upstream data pipelines.
-- models/staging/sources.yml (YAML config shown as SQL comment)
-- sources:
-- - name: raw
-- database: analytics
-- schema: raw_data
-- freshness:
-- warn_after: {count: 12, period: hour}
-- error_after: {count: 24, period: hour}
-- loaded_at_field: _loaded_at
-- tables:
-- - name: orders
-- columns:
-- - name: order_id
-- tests: [unique, not_null]
-- - name: amount
-- tests:
-- - not_null
-- - accepted_values:
-- values: ['>0']
-- - name: status
-- tests:
-- - accepted_values:
-- values: ['pending','completed','cancelled']
-- Custom data test: models/tests/assert_positive_revenue.sql
SELECT order_id, amount
FROM {{ ref('fct_orders') }}
WHERE amount < 0;
-- Test PASSES if this returns 0 rows
-- Custom test: referential integrity
SELECT o.customer_id
FROM {{ ref('fct_orders') }} o
LEFT JOIN {{ ref('dim_customers') }} c
ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;
-- Run freshness check: dbt source freshness
-- Run tests: dbt test
-- Run specific test: dbt test --select fct_ordersSponsored
dbt Cloud
Use Cases
- Ensuring upstream data sources are fresh
- Automated data quality testing in dbt
- Schema validation for staging models
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
SQL Data Quality Checks and Assertions
Reusable SQL queries for data quality: null checks, uniqueness, referential integrity, and freshness.
Best for: Automated data quality gates in ETL pipelines
dbt Run and Test — CI/CD Pipeline Script
Bash script for running dbt build with testing, documentation generation, and failure notifications.
Best for: Automating dbt builds in CI/CD pipelines
Data Quality Testing with Expectations
Define and run data quality expectations for automated validation in data pipelines.
Best for: Automated data quality gates in pipelines
dbt Incremental Model Pattern
Build efficient dbt incremental models that process only new or changed data instead of full refreshes.
Best for: Efficient data warehouse builds processing only deltas