Problem
Table: Users
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| name | varchar |
| mail | varchar |
+---------------+---------+
user_id
is the primary key (column with unique values) for this table.
This table contains information of the users signed up in a website. Some e-mails are invalid.
Write a solution to find the users who have valid emails.
A valid e-mail has a prefix name and a domain where:
- The prefix name is a string that may contain letters (upper or lower case), digits, underscore
'_'
, period'.'
, and/or dash'-'
. The prefix name must start with a letter. - The domain is
'@leetcode.com'
.
Return the result table in any order.
The result format is in the following example.
Examples
Example 1:
Input: Users table:
+---------+-----------+-------------------------+
| user_id | name | mail |
+---------+-----------+-------------------------+
| 1 | Winston | winston@leetcode.com |
| 2 | Jonathan | jonathanisgreat |
| 3 | Annabelle | bella-@leetcode.com |
| 4 | Sally | sally.come@leetcode.com |
| 5 | Marwan | quarz#2020@leetcode.com |
| 6 | David | david69@gmail.com |
| 7 | Shapiro | .shapo@leetcode.com |
+---------+-----------+-------------------------+
Output:
+---------+-----------+-------------------------+
| user_id | name | mail |
+---------+-----------+-------------------------+
| 1 | Winston | winston@leetcode.com |
| 3 | Annabelle | bella-@leetcode.com |
| 4 | Sally | sally.come@leetcode.com |
+---------+-----------+-------------------------+
Explanation: The mail of user 2 does not have a domain. The mail of user 5 has the # sign which is not allowed. The mail of user 6 does not have the leetcode domain. The mail of user 7 starts with a period.
Solution
Method 1 - Using Regexp
Code
A detailed explanation of the following regular expression solution:
'^[A-Za-z]+[A-Za-z0-9\_\.\-]*@leetcode.com'
^
means the beginning of the string[]
means character set.[A-Z]
means any upper case chars. In other words, the short dash in the character set means range.- After the first and the second character set, there is a notation:
+
or*
-+
means at least one of the character from the preceding charset, and*
means 0 or more. \
inside the charset mean skipping. In other words,\
. means we want the dot as it is. Remember, for example, - means range in the character set. So what if we would like to find - itself as a character? use\
-.- Everything else, like
@leetcode.com
refers to exact match.
SQL
select * from Users
where regexp_like(mail, '^[A-Za-z]+[A-Za-z0-9\_\.\-]*@leetcode.com')
Pandas
import pandas as pd
def valid_emails(users: pd.DataFrame) -> pd.DataFrame:
return users[
users['mail'].str.match(r'^[a-zA-Z][a-zA-Z\d_.-]*@leetcode\.com')
]
Complexity
- Time:
O(b)
- Space:
O(1)