</>SnippetsLabBuild faster with production-ready snippets

pythonbeginner

Pandas Category Dtype Optimization

Convert string columns to categorical dtype to dramatically reduce memory and speed up groupby.

pythonPress ⌘/Ctrl + Shift + C to copy

import pandas as pd, numpy as np

n = 1_000_000
df = pd.DataFrame({'city': np.random.choice(['New York','London','Tokyo','Sydney'], n),'status': np.random.choice(['active','inactive','pending'], n),'value': np.random.rand(n)})

before = df.memory_usage(deep=True).sum() / 1e6
df['city']   = df['city'].astype('category')
df['status'] = df['status'].astype('category')
after = df.memory_usage(deep=True).sum() / 1e6

print(f'Memory: {before:.1f} MB -> {after:.1f} MB')
print(df.groupby(['city','status'])['value'].mean())

Use Cases

memory optimization
categorical groupby
large dataset handling

Tags

#pandas #category #memory #optimization

Related Snippets

Similar patterns you can reuse in the same workflow.

pythonintermediate

Pandas Memory Reduction via Dtypes

Reduce DataFrame memory by 60-80% by downcasting numeric types and using categorical columns.

Best for: large dataset loading

pythonintermediate

Read Large CSV in Chunks with Pandas

Process CSV files larger than RAM by reading in chunks — memory-efficient ETL pattern for data pipelines.

Best for: Processing multi-GB CSV files without running out of memory

pythonintermediate

Stream Large SQL Query in Chunks

Read millions of rows from SQL in memory-safe chunks using pandas read_sql with chunksize.

Best for: large table extraction

pythonintermediate

Optimize Memory with slots

Reduce memory usage for classes with many instances using __slots__.

Best for: Memory-constrained applications