pythonintermediate

Row Fingerprinting with hashlib

Generate deterministic hash fingerprints for each row to detect changes in incremental loads.

python
import hashlib, json
import pandas as pd

df = pd.DataFrame({'id':[1,2,3],'name':['Alice','Bob','Carol'],'score':[90,85,92]})

def row_hash(row):
    payload = json.dumps(row.to_dict(), sort_keys=True, default=str)
    return hashlib.sha256(payload.encode()).hexdigest()[:16]

df['_hash'] = df.apply(row_hash, axis=1)
print(df[['id','_hash']])

Use Cases

  • change data capture
  • incremental load detection
  • data versioning

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.