pythonbeginner
Concat & Deduplicate DataFrames
Merge multiple DataFrames and remove duplicates by composite key for clean data consolidation.
pythonPress ⌘/Ctrl + Shift + C to copy
import pandas as pd
df1 = pd.DataFrame({'id':[1,2,3],'email':['a@x.com','b@x.com','c@x.com'],'source':'db1'})
df2 = pd.DataFrame({'id':[2,3,4],'email':['b@x.com','c_new@x.com','d@x.com'],'source':'db2'})
combined = pd.concat([df1, df2], ignore_index=True)
deduped = combined.drop_duplicates(subset=['id'], keep='last').reset_index(drop=True)
print(deduped)Use Cases
- data consolidation
- deduplication
- multi-source ETL
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
sqlintermediate
SQL Data Deduplication Techniques
Remove duplicate records using ROW_NUMBER, DISTINCT ON, and self-join deduplication strategies.
Best for: Cleaning duplicate records in production databases
#sql#deduplication
pythonintermediate
Read Large CSV in Chunks with Pandas
Process CSV files larger than RAM by reading in chunks — memory-efficient ETL pattern for data pipelines.
Best for: Processing multi-GB CSV files without running out of memory
#pandas#csv
pythonintermediate
Pandas Vectorised Operations vs Apply
Compare apply vs vectorised pandas operations for performance-critical column transformations.
Best for: feature engineering
#pandas#vectorization
pythonbeginner
SQLite + Pandas Local Data Pipeline
Run a lightweight local ETL with SQLite and pandas: load CSV, transform, persist to SQLite.
Best for: local analytics
#sqlite#pandas