pythonintermediate
Polars Lazy Scan of Parquet Files
Use Polars scan_parquet with predicate and projection pushdown for fast Parquet analytics.
pythonPress ⌘/Ctrl + Shift + C to copy
import polars as pl
lf = (
pl.scan_parquet('s3://bucket/events/*.parquet')
.filter(
(pl.col('year') == 2024) &
(pl.col('action').is_in(['purchase','refund']))
)
.select(['user_id','action','amount','ts'])
.group_by(['user_id','action'])
.agg([
pl.col('amount').sum().alias('total'),
pl.col('ts').max().alias('last_event'),
])
)
result = lf.collect(streaming=True)
print(result.head())Use Cases
- lakehouse queries
- predicate pushdown
- efficient Parquet reads
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonbeginner
Parquet File Read and Write in Python
Read and write Parquet files with pandas and PyArrow including partitioning and schema control.
Best for: Efficient columnar storage for analytics data
#parquet#pyarrow
pythonintermediate
Polars Lazy Query — Fast DataFrame Processing
Use Polars lazy evaluation for high-performance data transformations that outperform pandas.
Best for: High-performance data processing replacing pandas
#polars#dataframe
pythonbeginner
DuckDB — Query Parquet Files with Python
Use DuckDB to query Parquet files and CSVs directly from Python without loading into memory first.
Best for: Ad-hoc analytics on Parquet files without Spark
#duckdb#parquet
pythonbeginner
Polars Dataframe
Data science technique: polars-dataframe
Best for: machine learning
#data#machine-learning