pythonadvanced

PySpark Window Functions

Use PySpark window functions for running totals, rank, lag/lead, and percentile computations.

python
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.window import Window

spark = SparkSession.builder.appName('window-demo').getOrCreate()
df = spark.read.parquet('sales.parquet')

w_dept = Window.partitionBy('dept').orderBy('date')

df = (
    df
    .withColumn('running_total', F.sum('revenue').over(w_dept))
    .withColumn('rank',          F.rank().over(w_dept))
    .withColumn('prev_revenue',  F.lag('revenue', 1).over(w_dept))
)
df.show(10)

Use Cases

  • sales analytics
  • ranking pipelines
  • time-series ETL

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.