pythonbeginner

Pandas Category Dtype Optimization

Convert string columns to categorical dtype to dramatically reduce memory and speed up groupby.

python
import pandas as pd, numpy as np

n = 1_000_000
df = pd.DataFrame({'city': np.random.choice(['New York','London','Tokyo','Sydney'], n),'status': np.random.choice(['active','inactive','pending'], n),'value': np.random.rand(n)})

before = df.memory_usage(deep=True).sum() / 1e6
df['city']   = df['city'].astype('category')
df['status'] = df['status'].astype('category')
after = df.memory_usage(deep=True).sum() / 1e6

print(f'Memory: {before:.1f} MB -> {after:.1f} MB')
print(df.groupby(['city','status'])['value'].mean())

Use Cases

  • memory optimization
  • categorical groupby
  • large dataset handling

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.