sqlintermediate
BigQuery — Partitioned and Clustered Tables
Create BigQuery tables with time partitioning and clustering for optimal query performance and cost.
sqlPress ⌘/Ctrl + Shift + C to copy
-- Create partitioned + clustered table
CREATE TABLE `project.dataset.events`
PARTITION BY DATE(event_timestamp)
CLUSTER BY user_id, event_type
AS
SELECT
event_id,
user_id,
event_type,
event_timestamp,
properties
FROM `project.dataset.raw_events`;
-- Query with partition pruning (scans only relevant dates)
SELECT
event_type,
COUNT(*) AS event_count,
COUNT(DISTINCT user_id) AS unique_users
FROM `project.dataset.events`
WHERE event_timestamp BETWEEN '2024-01-01' AND '2024-03-31'
AND event_type IN ('page_view', 'purchase')
GROUP BY event_type;
-- Integer range partitioning
CREATE TABLE `project.dataset.users`
PARTITION BY RANGE_BUCKET(user_id, GENERATE_ARRAY(0, 1000000, 10000))
CLUSTER BY country, signup_date
AS SELECT * FROM `project.dataset.raw_users`;
-- Ingestion-time partitioning
CREATE TABLE `project.dataset.logs` (
message STRING,
severity STRING,
timestamp TIMESTAMP
)
PARTITION BY _PARTITIONDATE;
-- Check partition metadata
SELECT
table_name,
partition_id,
total_rows,
total_logical_bytes / (1024*1024) AS size_mb
FROM `project.dataset.INFORMATION_SCHEMA.PARTITIONS`
WHERE table_name = 'events'
ORDER BY partition_id DESC
LIMIT 10;
-- Expire old partitions automatically
ALTER TABLE `project.dataset.events`
SET OPTIONS (partition_expiration_days = 365);Use Cases
- Optimizing BigQuery costs with partition pruning
- High-performance analytics on time-series data
- Data warehouse table design best practices
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
sqlintermediate
dbt Incremental Model Pattern
Build efficient dbt incremental models that process only new or changed data instead of full refreshes.
Best for: Efficient data warehouse builds processing only deltas
#dbt#incremental
sqladvanced
Snowflake MERGE with Slowly Changing Dim
Implement SCD Type 2 in Snowflake using MERGE to track historical changes in dimension tables.
Best for: Tracking full history of dimension changes
#snowflake#merge
pythonadvanced
Clustering Kmeans
Data science technique: clustering-kmeans
Best for: machine learning
#data#machine-learning
sqladvanced
Table Partitioning by Range
Partition large tables by date range for faster queries and easier data lifecycle management.
Best for: Time-series data
#partitioning#performance