pythonbeginner

Concat & Deduplicate DataFrames

Merge multiple DataFrames and remove duplicates by composite key for clean data consolidation.

python
import pandas as pd

df1 = pd.DataFrame({'id':[1,2,3],'email':['a@x.com','b@x.com','c@x.com'],'source':'db1'})
df2 = pd.DataFrame({'id':[2,3,4],'email':['b@x.com','c_new@x.com','d@x.com'],'source':'db2'})

combined = pd.concat([df1, df2], ignore_index=True)
deduped = combined.drop_duplicates(subset=['id'], keep='last').reset_index(drop=True)
print(deduped)

Use Cases

  • data consolidation
  • deduplication
  • multi-source ETL

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.