Leetcode 196 - Delete Duplicate Emails

Problem

Table: Person

1
2
3
4
5
6
7
8
9
+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+

id is the primary key column for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.

Write an SQL query to delete all the duplicate emails, keeping only one unique email with the smallest id. Note that you are supposed to write a DELETE statement and not a SELECT one.

After running your script, the answer shown is the Person table. The driver will first compile and run your piece of code and then show the Person table. The final order of the Person table does not matter.

Examples

Example 1:

Input: Person table:

1
2
3
4
5
6
7
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |
+----+------------------+

Output:

1
2
3
4
5
6
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
+----+------------------+

Explanation: [email protected] is repeated two times. We keep the row with the smallest Id = 1.

Solution

Method 1 - Where Not in Min IDs

Code

1
2
DELETE FROM Person
WHERE id NOT IN (SELECT MIN(id) as id FROM Person GROUP BY email)
1
You can't specify target table 'Person' for update in FROM clause
1
2
3
4
5
6
DELETE FROM Person 
WHERE id NOT IN (
    SELECT * FROM (
        SELECT MIN(id)
        FROM Person
        GROUP BY email) as minIds);

Method 2 - Using Self Join

Code

1
2
DELETE p FROM Person p
JOIN Person q ON p.Email = q.Email AND p.Id > q.Id;
1
2
3
4
DELETE p FROM Person p,
    Person q
WHERE
    p.Email = q.Email AND p.Id > q.Id;
1
2
3
def delete_duplicate_emails(person: pd.DataFrame):
    person.sort_values(by='id', inplace=True)
    person.drop_duplicates(subset=['email'], inplace=True)