pythonintermediate

Stratified Sampling with pandas

Draw a stratified random sample from a DataFrame, preserving class proportions for ML splits.

python
import pandas as pd

df = pd.DataFrame({'class':['A']*500 + ['B']*300 + ['C']*200, 'value': range(1000)})

def stratified_sample(df, col, n, seed=42):
    return (
        df.groupby(col, group_keys=False)
          .apply(lambda g: g.sample(frac=n/len(df), random_state=seed))
          .reset_index(drop=True)
    )

sample = stratified_sample(df, 'class', 100)
print(sample['class'].value_counts())

Use Cases

  • ML dataset splitting
  • balanced sampling
  • survey sampling

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.