pythonbeginner

Detect & Remove Pandas Duplicates

Find, count, and remove duplicate rows with flexible keep strategy and composite key support.

python
import pandas as pd

df = pd.DataFrame({'email':['a@x.com','b@x.com','a@x.com','c@x.com','b@x.com'],'name':['Alice','Bob','Alice','Carol','Bobby'],'score':[90,85,90,75,88]})

print('Duplicate rows:', df.duplicated().sum())
print('Duplicate emails:', df.duplicated(subset=['email']).sum())

# Keep highest score per email
deduped = df.sort_values('score', ascending=False).drop_duplicates(subset=['email'], keep='first')
print(deduped)

Use Cases

  • data cleaning
  • deduplication
  • unique constraint enforcement

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.