Problem

Table: Emails

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.

Write a solution to find all unique email domains and count the number of individuals associated with each domain. Consider only those domains that end with .com.

Return the result table orderd by email domains inascending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
Input: 
Emails table:
+-----+-----------------------+
| id  | email                 |
+-----+-----------------------+
| 336 | hwkiy@test.edu        |
| 489 | adcmaf@outlook.com    |
| 449 | vrzmwyum@yahoo.com    |
| 95  | tof@test.edu          |
| 320 | jxhbagkpm@example.org |
| 411 | zxcf@outlook.com      |
+----+------------------------+
Output: 
+--------------+-------+
| email_domain | count |
+--------------+-------+
| outlook.com  | 2     |
| yahoo.com    | 1     |  
+--------------+-------+
Explanation: 
- The valid domains ending with ".com" are only "outlook.com" and "yahoo.com", with respective counts of 2 and 1.
Output table is ordered by email_domains in ascending order.

Solution

Method 1 – SQL String Functions and Group By

Intuition

We need to extract the domain from each email, filter for domains ending with .com, and count the number of unique users for each domain. This can be done using SQL string functions and grouping.

Approach

  1. Use SUBSTRING_INDEX(email, '@', -1) to extract the domain from the email.
  2. Filter for domains ending with .com using LIKE '%.com'.
  3. Group by the domain and count the number of users for each domain.
  4. Order the result by domain in ascending order.

Code

1
2
3
4
5
SELECT SUBSTRING_INDEX(email, '@', -1) AS domain, COUNT(*) AS count
FROM Emails
WHERE SUBSTRING_INDEX(email, '@', -1) LIKE '%.com'
GROUP BY domain
ORDER BY domain ASC;
1
2
3
4
5
SELECT SPLIT_PART(email, '@', 2) AS domain, COUNT(*) AS count
FROM Emails
WHERE SPLIT_PART(email, '@', 2) LIKE '%.com'
GROUP BY domain
ORDER BY domain ASC;
1
2
3
4
5
6
def find_unique_email_domains(emails: 'pd.DataFrame') -> 'pd.DataFrame':
    emails = emails.copy()
    emails['domain'] = emails['email'].str.split('@').str[-1]
    filtered = emails[emails['domain'].str.endswith('.com')]
    result = filtered.groupby('domain').size().reset_index(name='count').sort_values('domain')
    return result

Complexity

  • ⏰ Time complexity: O(n), where n is the number of emails, as each email is processed once.
  • 🧺 Space complexity: O(n), for storing the result set and intermediate columns.