pythonbeginner

Dataclasses as Pipeline Data Models

Use Python dataclasses to define typed, immutable data models passed between pipeline stages.

python
from dataclasses import dataclass
from datetime import datetime

@dataclass(frozen=True)
class RawEvent:
    event_id: str
    user_id:  int
    action:   str
    ts:       float

@dataclass(frozen=True)
class EnrichedEvent:
    event_id:    str
    user_id:     int
    action:      str
    created_at:  datetime
    hour:        int
    day_of_week: str

def enrich(raw: RawEvent) -> EnrichedEvent:
    dt = datetime.utcfromtimestamp(raw.ts)
    return EnrichedEvent(raw.event_id, raw.user_id, raw.action, dt, dt.hour, dt.strftime('%A'))

print(enrich(RawEvent('e1', 1, 'click', 1_700_000_000.0)))

Use Cases

  • typed pipeline stages
  • data modeling
  • functional ETL

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.