Problem

Table: Users

+-----------------+---------+
| Column Name     | Type    |
+-----------------+---------+
| user_id         | int     |
| email           | varchar |
+-----------------+---------+
(user_id) is the unique key for this table.
Each row contains a user's unique ID and email address.

Write a solution to find all the valid email addresses. A valid email address meets the following criteria:

  • It contains exactly one @ symbol.
  • It ends with .com.
  • The part before the @ symbol contains only alphanumeric characters and underscores.
  • The part after the @ symbol and before .com contains a domain name that contains only letters.

Return the result table ordered by user_id in ascending order.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Input:
Users table:
+---------+---------------------+
| user_id | email               |
+---------+---------------------+
| 1       | alice@example.com   |
| 2       | bob_at_example.com  |
| 3       | charlie@example.net |
| 4       | david@domain.com    |
| 5       | eve@invalid         |
+---------+---------------------+
Output:
+---------+-------------------+
| user_id | email             |
+---------+-------------------+
| 1       | alice@example.com |
| 4       | david@domain.com  |
+---------+-------------------+
Explanation:
* **alice@example.com** is valid because it contains one `@`, alice is alphanumeric, and example.com starts with a letter and ends with .com.
* **bob_at_example.com** is invalid because it contains an underscore instead of an `@`.
* **charlie@example.net** is invalid because the domain does not end with `.com`.
* **david@domain.com** is valid because it meets all criteria.
* **eve@invalid** is invalid because the domain does not end with `.com`.
Result table is ordered by user_id in ascending order.

Example 2:

Solution

Method 1 – Regex Filtering

Intuition

We use a regular expression to match emails that satisfy all the given criteria: exactly one @, ends with .com, valid username (alphanumeric/underscore), and valid domain (letters only).

Approach

  1. Use a regex to match emails with:
    • One @ symbol.
    • Ends with .com.
    • Username (before @) contains only alphanumeric or underscores.
    • Domain (after @ and before .com) contains only letters.
  2. Filter the Users table using this regex.
  3. Return the user_id and email, ordered by user_id ascending.

Code

1
2
3
4
SELECT user_id, email
FROM Users
WHERE email REGEXP '^[A-Za-z0-9_]+@[A-Za-z]+\\.com$'
ORDER BY user_id;
1
2
3
4
SELECT user_id, email
FROM Users
WHERE email ~ '^[A-Za-z0-9_]+@[A-Za-z]+\\.com$'
ORDER BY user_id;
1
2
3
4
5
6
class Solution:
    def find_valid_emails(self, Users: 'pd.DataFrame') -> 'pd.DataFrame':
        import re
        pat = r'^[A-Za-z0-9_]+@[A-Za-z]+\.com$'
        ans = Users[Users['email'].str.match(pat)].copy()
        return ans.sort_values('user_id')[['user_id', 'email']]

Complexity

  • ⏰ Time complexity: O(N), where N is the number of users, since we check each email once.
  • 🧺 Space complexity: O(N), for storing the filtered result.