pythonintermediate

PyArrow Schema Enforcement

Define and enforce strict schemas on columnar data using PyArrow before writing to Parquet.

python
import pyarrow as pa
import pyarrow.parquet as pq

schema = pa.schema([
    pa.field('id',         pa.int64(),  nullable=False),
    pa.field('name',       pa.string(), nullable=False),
    pa.field('score',      pa.float32()),
    pa.field('created_at', pa.timestamp('ms')),
])

table = pa.table(
    {'id': [1, 2], 'name': ['Alice', 'Bob'], 'score': [0.9, 0.8], 'created_at': [1_700_000_000_000, 1_700_000_001_000]},
    schema=schema,
)
pq.write_table(table, 'output.parquet', compression='zstd')
print('Written', table.num_rows, 'rows')

Use Cases

  • data lake storage
  • schema enforcement
  • Parquet writing

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.