pythonintermediate

Compare Two DataFrames for Changes

Detect row-level additions, deletions, and modifications between two DataFrame snapshots.

python
import pandas as pd

old = pd.DataFrame({'id':[1,2,3,4],'value':['a','b','c','d'],'score':[10,20,30,40]})
new = pd.DataFrame({'id':[1,2,3,5],'value':['a','B','c','e'],'score':[10,25,30,50]})

old_idx = set(old['id'])
new_idx = set(new['id'])

deleted  = old[old['id'].isin(old_idx - new_idx)]
added    = new[new['id'].isin(new_idx - old_idx)]

common_old = old[old['id'].isin(old_idx & new_idx)].set_index('id')
common_new = new[new['id'].isin(old_idx & new_idx)].set_index('id')
modified = common_new[(common_old != common_new).any(axis=1)]

print('Deleted:', deleted['id'].tolist())
print('Added:',   added['id'].tolist())
print('Modified:',modified.index.tolist())

Use Cases

  • change data capture
  • audit logging
  • snapshot diffing

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.