pythonintermediate

Scikit-learn Feature Pipeline

Build a reproducible ML feature pipeline with ColumnTransformer, StandardScaler, and OneHotEncoder.

python
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

X = pd.DataFrame({'age':[25,32],'salary':[50000,80000],'dept':['eng','hr']})
y = [1, 0]

preprocessor = ColumnTransformer([
    ('num', StandardScaler(),  ['age','salary']),
    ('cat', OneHotEncoder(),   ['dept']),
])

clf = Pipeline([('pre', preprocessor), ('clf', RandomForestClassifier())])
clf.fit(X, y)
print(clf.predict(X))

Use Cases

  • ML preprocessing
  • feature pipelines
  • reproducible workflows

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.