pythonbeginner
Pandas Category Dtype Optimization
Convert string columns to categorical dtype to dramatically reduce memory and speed up groupby.
pythonPress ⌘/Ctrl + Shift + C to copy
import pandas as pd, numpy as np
n = 1_000_000
df = pd.DataFrame({'city': np.random.choice(['New York','London','Tokyo','Sydney'], n),'status': np.random.choice(['active','inactive','pending'], n),'value': np.random.rand(n)})
before = df.memory_usage(deep=True).sum() / 1e6
df['city'] = df['city'].astype('category')
df['status'] = df['status'].astype('category')
after = df.memory_usage(deep=True).sum() / 1e6
print(f'Memory: {before:.1f} MB -> {after:.1f} MB')
print(df.groupby(['city','status'])['value'].mean())Use Cases
- memory optimization
- categorical groupby
- large dataset handling
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonintermediate
Pandas Memory Reduction via Dtypes
Reduce DataFrame memory by 60-80% by downcasting numeric types and using categorical columns.
Best for: large dataset loading
#pandas#memory
pythonintermediate
Read Large CSV in Chunks with Pandas
Process CSV files larger than RAM by reading in chunks — memory-efficient ETL pattern for data pipelines.
Best for: Processing multi-GB CSV files without running out of memory
#pandas#csv
pythonintermediate
Stream Large SQL Query in Chunks
Read millions of rows from SQL in memory-safe chunks using pandas read_sql with chunksize.
Best for: large table extraction
#pandas#sql
pythonintermediate
Optimize Memory with __slots__
Reduce memory usage for classes with many instances using __slots__.
Best for: Memory-constrained applications
#python#slots