SQL Data Quality Checks and Assertions
Reusable SQL queries for data quality: null checks, uniqueness, referential integrity, and freshness.
-- 1. Null check — critical columns must not be null
SELECT 'null_check' AS test,
COUNT(*) AS failures
FROM orders
WHERE order_id IS NULL
OR customer_id IS NULL
OR amount IS NULL;
-- 2. Uniqueness check
SELECT 'unique_order_id' AS test,
COUNT(*) - COUNT(DISTINCT order_id) AS duplicates
FROM orders;
-- 3. Referential integrity (orphan records)
SELECT 'orphan_orders' AS test,
COUNT(*) AS orphans
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
WHERE c.id IS NULL;
-- 4. Range/domain validation
SELECT 'invalid_amounts' AS test,
COUNT(*) AS failures
FROM orders
WHERE amount <= 0 OR amount > 1000000;
-- 5. Freshness check (data should be < 24h old)
SELECT 'freshness' AS test,
EXTRACT(EPOCH FROM NOW() - MAX(created_at)) / 3600 AS hours_since_last,
CASE WHEN MAX(created_at) < NOW() - INTERVAL '24 hours'
THEN 'STALE' ELSE 'FRESH' END AS status
FROM orders;
-- 6. Row count anomaly detection
WITH daily_counts AS (
SELECT DATE(created_at) AS dt, COUNT(*) AS cnt
FROM orders
WHERE created_at >= CURRENT_DATE - 30
GROUP BY DATE(created_at)
)
SELECT 'volume_anomaly' AS test,
dt, cnt,
CASE WHEN cnt < AVG(cnt) OVER () * 0.5 THEN 'LOW'
WHEN cnt > AVG(cnt) OVER () * 2.0 THEN 'HIGH'
ELSE 'NORMAL' END AS status
FROM daily_counts
ORDER BY dt DESC;Sponsored
Supabase
Use Cases
- Automated data quality gates in ETL pipelines
- Detecting data freshness issues before dashboards
- Monitoring referential integrity in data warehouses
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
dbt Source Freshness and Testing
Configure dbt source freshness checks and schema tests to validate upstream data pipelines.
Best for: Ensuring upstream data sources are fresh
SQL Data Deduplication Techniques
Remove duplicate records using ROW_NUMBER, DISTINCT ON, and self-join deduplication strategies.
Best for: Cleaning duplicate records in production databases
Data Quality Testing with Expectations
Define and run data quality expectations for automated validation in data pipelines.
Best for: Automated data quality gates in pipelines
dbt Run and Test — CI/CD Pipeline Script
Bash script for running dbt build with testing, documentation generation, and failure notifications.
Best for: Automating dbt builds in CI/CD pipelines