pythonbeginner
Detect & Remove Pandas Duplicates
Find, count, and remove duplicate rows with flexible keep strategy and composite key support.
pythonPress ⌘/Ctrl + Shift + C to copy
import pandas as pd
df = pd.DataFrame({'email':['a@x.com','b@x.com','a@x.com','c@x.com','b@x.com'],'name':['Alice','Bob','Alice','Carol','Bobby'],'score':[90,85,90,75,88]})
print('Duplicate rows:', df.duplicated().sum())
print('Duplicate emails:', df.duplicated(subset=['email']).sum())
# Keep highest score per email
deduped = df.sort_values('score', ascending=False).drop_duplicates(subset=['email'], keep='first')
print(deduped)Use Cases
- data cleaning
- deduplication
- unique constraint enforcement
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonbeginner
Pandas Null Handling Strategies
Comprehensive strategies for detecting, filling, and handling missing values in pandas DataFrames.
Best for: Cleaning datasets with missing values
#pandas#null
pythonbeginner
Pandas String Operations
Clean, extract, and transform string columns using pandas .str accessor methods.
Best for: data cleaning
#pandas#strings
pythonbeginner
Concat & Deduplicate DataFrames
Merge multiple DataFrames and remove duplicates by composite key for clean data consolidation.
Best for: data consolidation
#pandas#deduplication
pythonbeginner
Pandas DataFrame Transformations
Common pandas DataFrame transformations including column operations, type casting, and string methods.
Best for: Cleaning raw data files for analysis
#pandas#dataframe