pythonadvanced
PySpark Window Functions
Use PySpark window functions for running totals, rank, lag/lead, and percentile computations.
pythonPress ⌘/Ctrl + Shift + C to copy
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.window import Window
spark = SparkSession.builder.appName('window-demo').getOrCreate()
df = spark.read.parquet('sales.parquet')
w_dept = Window.partitionBy('dept').orderBy('date')
df = (
df
.withColumn('running_total', F.sum('revenue').over(w_dept))
.withColumn('rank', F.rank().over(w_dept))
.withColumn('prev_revenue', F.lag('revenue', 1).over(w_dept))
)
df.show(10)Use Cases
- sales analytics
- ranking pipelines
- time-series ETL
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonadvanced
Spark SQL Query Example
PySpark DataFrame operations with SQL queries, window functions, and aggregations for big data.
Best for: Processing large-scale datasets with Spark
#spark#pyspark
sqladvanced
SQL Window Functions for Analytics
Advanced SQL window functions for running totals, rankings, moving averages, and gap analysis.
Best for: Building analytics dashboards with running totals
#sql#window-functions
sqladvanced
SQL Window Functions for Analytics
Use window functions for running totals, rankings, moving averages, and gap detection in analytics.
Best for: Building cumulative revenue dashboards
#sql#window-functions
pythonadvanced
PySpark DataFrame — Filter and Aggregate
Common PySpark DataFrame operations: filter, group by, window functions, and write to Parquet.
Best for: Large-scale data aggregation on distributed clusters
#spark#pyspark