Row Fingerprinting with hashlib
Generate deterministic hash fingerprints for each row to detect changes in incremental loads.
import hashlib, json
import pandas as pd
df = pd.DataFrame({'id':[1,2,3],'name':['Alice','Bob','Carol'],'score':[90,85,92]})
def row_hash(row):
payload = json.dumps(row.to_dict(), sort_keys=True, default=str)
return hashlib.sha256(payload.encode()).hexdigest()[:16]
df['_hash'] = df.apply(row_hash, axis=1)
print(df[['id','_hash']])Use Cases
- change data capture
- incremental load detection
- data versioning
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
Python ETL Pipeline Example
Complete extract-transform-load pipeline with error handling, logging, and incremental processing.
Best for: Automating data ingestion from CSV to warehouse
Python Batch Processing Script
Process large files in configurable batches with progress tracking, error handling, and resume support.
Best for: Processing large CSV files that don't fit in memory
Database Sync Script in Python
Sync data between two databases with upsert logic, batch processing, and change detection.
Best for: Replicating data between databases
SQL Incremental Load Pattern
Incremental data load using watermark tracking to process only new and updated records efficiently.
Best for: Efficient warehouse loading without full reloads