Input:
+-------------+---------+---------------------+| customer_id | name | email |+-------------+---------+---------------------+|1| Ella | emily@example.com||2| David | michael@example.com||3| Zachary | sarah@example.com||4| Alice | john@example.com||5| Finn | john@example.com||6| Violet | alice@example.com|+-------------+---------+---------------------+Output:
+-------------+---------+---------------------+| customer_id | name | email |+-------------+---------+---------------------+|1| Ella | emily@example.com||2| David | michael@example.com||3| Zachary | sarah@example.com||4| Alice | john@example.com||6| Violet | alice@example.com|+-------------+---------+---------------------+Explanation:
Alic (customer_id =4) and Finn(customer_id =5) both use john@example.com, so only the first occurrence of this email is retained.## Solution
### Method 1โ pandas drop_duplicates
#### Intuition
To remove duplicate rows based on the `email` column and keep only the first occurrence, we can use the `drop_duplicates` method in pandas, specifying the `email` column and `keep='first'`.#### Approach
1. Use `drop_duplicates` on the DataFrame, specifying `subset=['email']` and `keep='first'`.2. Return the resulting DataFrame, sorted by `customer_id`if required.#### Code