</>SnippetsLabBuild faster with production-ready snippets

pythonintermediate

Scikit-learn Feature Pipeline

Build a reproducible ML feature pipeline with ColumnTransformer, StandardScaler, and OneHotEncoder.

pythonPress ⌘/Ctrl + Shift + C to copy

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

X = pd.DataFrame({'age':[25,32],'salary':[50000,80000],'dept':['eng','hr']})
y = [1, 0]

preprocessor = ColumnTransformer([
    ('num', StandardScaler(),  ['age','salary']),
    ('cat', OneHotEncoder(),   ['dept']),
])

clf = Pipeline([('pre', preprocessor), ('clf', RandomForestClassifier())])
clf.fit(X, y)
print(clf.predict(X))

Use Cases

ML preprocessing
feature pipelines
reproducible workflows

Tags

#scikit-learn #ml-pipeline #feature-engineering #preprocessing

Related Snippets

Similar patterns you can reuse in the same workflow.

Pandas Categorical Encoding for ML

One-hot encode, label encode, and ordinal encode categorical columns using pandas and scikit-learn.

Best for: ML preprocessing

#pandas#encoding

pythonintermediate

Pandas IntervalIndex for Binning

Use IntervalIndex and pd.cut to bin continuous variables into labelled categories.

Best for: grading systems

#pandas#binning

pythonintermediate

Pandas GroupBy Transform Patterns

Use groupby().transform() to compute group-level statistics and broadcast them back to row level.

Best for: feature engineering

#pandas#groupby

Pandas Datetime Component Extraction

Extract year, month, day, hour, day-of-week and other components from a datetime column.

Best for: time-based features

#pandas#datetime