pythonintermediate

Pandas Apply with Chunked Progress

Apply a function to a large DataFrame in chunked batches to avoid memory spikes and track progress.

python
import pandas as pd
import numpy as np

def process_batch(df_chunk: pd.DataFrame) -> pd.DataFrame:
    df_chunk = df_chunk.copy()
    df_chunk['result'] = df_chunk['value'] ** 2 + df_chunk['value']
    return df_chunk

df = pd.DataFrame({'value': np.random.rand(100_000)})
chunk_size = 10_000
chunks = [df.iloc[i:i+chunk_size] for i in range(0, len(df), chunk_size)]

processed = pd.concat([process_batch(c) for c in chunks], ignore_index=True)
print(f'Processed {len(processed):,} rows')
print(processed.describe())

Use Cases

  • memory-safe transforms
  • chunked processing
  • large DataFrame ops

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.