pythonadvanced
Great Expectations Data Quality Suite
Define and run a Great Expectations validation suite to catch data quality issues early.
pythonPress ⌘/Ctrl + Shift + C to copy
import great_expectations as ge
import pandas as pd
df = pd.read_csv('orders.csv')
ge_df = ge.from_pandas(df)
results = [
ge_df.expect_column_to_exist('order_id'),
ge_df.expect_column_values_to_not_be_null('customer_id'),
ge_df.expect_column_values_to_be_between('amount', min_value=0, max_value=100_000),
]
for r in results:
if not r['success']:
print('FAIL:', r['expectation_config']['expectation_type'])Use Cases
- CI data validation
- pipeline quality gates
- data contracts
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonintermediate
Data Quality Testing with Expectations
Define and run data quality expectations for automated validation in data pipelines.
Best for: Automated data quality gates in pipelines
#data-quality#testing
pythonintermediate
Data Validation with Pydantic
Validate and parse data records using Pydantic models with custom validators and error reporting.
Best for: Validating incoming data before warehouse loading
#validation#pydantic
sqlintermediate
SQL Data Quality Checks and Assertions
Reusable SQL queries for data quality: null checks, uniqueness, referential integrity, and freshness.
Best for: Automated data quality gates in ETL pipelines
#sql#data-quality
sqlbeginner
dbt Source Freshness and Testing
Configure dbt source freshness checks and schema tests to validate upstream data pipelines.
Best for: Ensuring upstream data sources are fresh
#dbt#testing