pythonbeginner
Pandas Categorical Encoding for ML
One-hot encode, label encode, and ordinal encode categorical columns using pandas and scikit-learn.
pythonPress ⌘/Ctrl + Shift + C to copy
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OrdinalEncoder
df = pd.DataFrame({'color':['red','blue','green','red'],'size':['S','L','M','XL'],'target':[1,0,1,0]})
# One-hot encode
df_ohe = pd.get_dummies(df, columns=['color'], prefix='color')
# Label encode
le = LabelEncoder()
df['color_label'] = le.fit_transform(df['color'])
# Ordinal encode
oe = OrdinalEncoder(categories=[['S','M','L','XL']])
df['size_ord'] = oe.fit_transform(df[['size']])
print(df)Use Cases
- ML preprocessing
- feature engineering
- categorical handling
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonintermediate
Pandas Cartesian Feature Interaction
Generate pairwise feature interactions for ML by creating cross-product columns.
Best for: ML feature engineering
#pandas#feature-engineering
pythonintermediate
Pandas IntervalIndex for Binning
Use IntervalIndex and pd.cut to bin continuous variables into labelled categories.
Best for: grading systems
#pandas#binning
pythonintermediate
Stratified Sampling with pandas
Draw a stratified random sample from a DataFrame, preserving class proportions for ML splits.
Best for: ML dataset splitting
#pandas#sampling
pythonintermediate
Pandas GroupBy Transform Patterns
Use groupby().transform() to compute group-level statistics and broadcast them back to row level.
Best for: feature engineering
#pandas#groupby