sqlintermediate

SQL Data Quality Checks and Assertions

Reusable SQL queries for data quality: null checks, uniqueness, referential integrity, and freshness.

sql
-- 1. Null check — critical columns must not be null
SELECT 'null_check' AS test,
    COUNT(*) AS failures
FROM orders
WHERE order_id IS NULL
   OR customer_id IS NULL
   OR amount IS NULL;

-- 2. Uniqueness check
SELECT 'unique_order_id' AS test,
    COUNT(*) - COUNT(DISTINCT order_id) AS duplicates
FROM orders;

-- 3. Referential integrity (orphan records)
SELECT 'orphan_orders' AS test,
    COUNT(*) AS orphans
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
WHERE c.id IS NULL;

-- 4. Range/domain validation
SELECT 'invalid_amounts' AS test,
    COUNT(*) AS failures
FROM orders
WHERE amount <= 0 OR amount > 1000000;

-- 5. Freshness check (data should be < 24h old)
SELECT 'freshness' AS test,
    EXTRACT(EPOCH FROM NOW() - MAX(created_at)) / 3600 AS hours_since_last,
    CASE WHEN MAX(created_at) < NOW() - INTERVAL '24 hours'
         THEN 'STALE' ELSE 'FRESH' END AS status
FROM orders;

-- 6. Row count anomaly detection
WITH daily_counts AS (
    SELECT DATE(created_at) AS dt, COUNT(*) AS cnt
    FROM orders
    WHERE created_at >= CURRENT_DATE - 30
    GROUP BY DATE(created_at)
)
SELECT 'volume_anomaly' AS test,
    dt, cnt,
    CASE WHEN cnt < AVG(cnt) OVER () * 0.5 THEN 'LOW'
         WHEN cnt > AVG(cnt) OVER () * 2.0 THEN 'HIGH'
         ELSE 'NORMAL' END AS status
FROM daily_counts
ORDER BY dt DESC;

Sponsored

Supabase

Use Cases

  • Automated data quality gates in ETL pipelines
  • Detecting data freshness issues before dashboards
  • Monitoring referential integrity in data warehouses

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.