pythonadvanced

Great Expectations Data Quality Suite

Define and run a Great Expectations validation suite to catch data quality issues early.

python
import great_expectations as ge
import pandas as pd

df = pd.read_csv('orders.csv')
ge_df = ge.from_pandas(df)

results = [
    ge_df.expect_column_to_exist('order_id'),
    ge_df.expect_column_values_to_not_be_null('customer_id'),
    ge_df.expect_column_values_to_be_between('amount', min_value=0, max_value=100_000),
]

for r in results:
    if not r['success']:
        print('FAIL:', r['expectation_config']['expectation_type'])

Use Cases

  • CI data validation
  • pipeline quality gates
  • data contracts

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.