pythonintermediate
Scikit-learn Feature Pipeline
Build a reproducible ML feature pipeline with ColumnTransformer, StandardScaler, and OneHotEncoder.
pythonPress ⌘/Ctrl + Shift + C to copy
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
X = pd.DataFrame({'age':[25,32],'salary':[50000,80000],'dept':['eng','hr']})
y = [1, 0]
preprocessor = ColumnTransformer([
('num', StandardScaler(), ['age','salary']),
('cat', OneHotEncoder(), ['dept']),
])
clf = Pipeline([('pre', preprocessor), ('clf', RandomForestClassifier())])
clf.fit(X, y)
print(clf.predict(X))Use Cases
- ML preprocessing
- feature pipelines
- reproducible workflows
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonbeginner
Pandas Categorical Encoding for ML
One-hot encode, label encode, and ordinal encode categorical columns using pandas and scikit-learn.
Best for: ML preprocessing
#pandas#encoding
pythonintermediate
Pandas IntervalIndex for Binning
Use IntervalIndex and pd.cut to bin continuous variables into labelled categories.
Best for: grading systems
#pandas#binning
pythonintermediate
Pandas GroupBy Transform Patterns
Use groupby().transform() to compute group-level statistics and broadcast them back to row level.
Best for: feature engineering
#pandas#groupby
pythonbeginner
Pandas Datetime Component Extraction
Extract year, month, day, hour, day-of-week and other components from a datetime column.
Best for: time-based features
#pandas#datetime