pythonbeginner

Pandas read_csv with Explicit Dtypes

Specify column dtypes on CSV read to avoid costly inference and prevent silent type coercion.

python
import pandas as pd

dtypes = {
    'id':           'int32',
    'user_id':      'int32',
    'amount':       'float32',
    'category':     'category',
    'status':       'category',
    'is_fraud':     'bool',
}

df = pd.read_csv(
    'transactions.csv',
    dtype=dtypes,
    parse_dates=['created_at'],
    usecols=list(dtypes) + ['created_at'],
    low_memory=False,
)
print(df.dtypes)
print(df.memory_usage(deep=True).sum() / 1e6, 'MB')

Use Cases

  • fast CSV loading
  • memory control
  • type safety

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.