pythonintermediate
Stratified Sampling with pandas
Draw a stratified random sample from a DataFrame, preserving class proportions for ML splits.
pythonPress ⌘/Ctrl + Shift + C to copy
import pandas as pd
df = pd.DataFrame({'class':['A']*500 + ['B']*300 + ['C']*200, 'value': range(1000)})
def stratified_sample(df, col, n, seed=42):
return (
df.groupby(col, group_keys=False)
.apply(lambda g: g.sample(frac=n/len(df), random_state=seed))
.reset_index(drop=True)
)
sample = stratified_sample(df, 'class', 100)
print(sample['class'].value_counts())Use Cases
- ML dataset splitting
- balanced sampling
- survey sampling
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonbeginner
Pandas Categorical Encoding for ML
One-hot encode, label encode, and ordinal encode categorical columns using pandas and scikit-learn.
Best for: ML preprocessing
#pandas#encoding
pythonintermediate
Pandas Cartesian Feature Interaction
Generate pairwise feature interactions for ML by creating cross-product columns.
Best for: ML feature engineering
#pandas#feature-engineering
pythonbeginner
Pandas DataFrame Transformations
Common pandas DataFrame transformations including column operations, type casting, and string methods.
Best for: Cleaning raw data files for analysis
#pandas#dataframe
pythonbeginner
Pandas DataFrame Filtering Techniques
Filter DataFrames using boolean masks, query syntax, isin, between, and string matching methods.
Best for: Extracting subsets of data for reporting
#pandas#filtering