Problem

Table: Tweets

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| tweet_id       | int     |
| content        | varchar |
+----------------+---------+
tweet_id is the primary key (column with unique values) for this table.
This table contains all the tweets in a social media app.

Write a solution to find invalid tweets. A tweet is considered invalid if it meets any of the following criteria:

  • It exceeds 140 characters in length.
  • It has more than 3 mentions.
  • It includes more than 3 hashtags.

Return the result table ordered by tweet_id inascending order.

The result format is in the following example.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
Input:
Tweets table:
+----------+-----------------------------------------------------------------------------------+
| tweet_id | content                                                                           |
+----------+-----------------------------------------------------------------------------------+
| 1        | Traveling, exploring, and living my best life @JaneSmith @SaraJohnson @LisaTaylor |
|          | @MikeBrown #Foodie #Fitness #Learning                                             | 
| 2        | Just had the best dinner with friends! #Foodie #Friends #Fun                      |
| 4        | Working hard on my new project #Work #Goals #Productivity #Fun                    |
+----------+-----------------------------------------------------------------------------------+
Output:
+----------+
| tweet_id |
+----------+
| 1        |
| 4        |
+----------+
Explanation:
* tweet_id 1 contains 4 mentions.
* tweet_id 4 contains 4 hashtags.
Output table is ordered by tweet_id in ascending order.

Solution

Method 1 – String Functions and Aggregation

Intuition

We need to check three conditions for each tweet: length, number of mentions, and number of hashtags. We can use string functions to count mentions and hashtags, and filter accordingly.

Approach

  1. For each tweet, check if the length of content is greater than 140.
  2. Count the number of mentions (occurrences of ‘@’) in content.
  3. Count the number of hashtags (occurrences of ‘#’) in content.
  4. Select tweets where any of the above conditions is true.
  5. Return the result ordered by tweet_id ascending.

Code

1
2
3
4
5
6
SELECT tweet_id
FROM Tweets
WHERE CHAR_LENGTH(content) > 140
   OR LENGTH(content) - LENGTH(REPLACE(content, '@', '')) > 3
   OR LENGTH(content) - LENGTH(REPLACE(content, '#', '')) > 3
ORDER BY tweet_id ASC;
1
2
3
4
5
6
SELECT tweet_id
FROM Tweets
WHERE LENGTH(content) > 140
   OR LENGTH(content) - LENGTH(REPLACE(content, '@', '')) > 3
   OR LENGTH(content) - LENGTH(REPLACE(content, '#', '')) > 3
ORDER BY tweet_id ASC;
1
2
3
4
5
def invalid_tweets(df: 'pd.DataFrame') -> 'pd.DataFrame':
    cond1 = df['content'].str.len() > 140
    cond2 = df['content'].str.count('@') > 3
    cond3 = df['content'].str.count('#') > 3
    return df.loc[cond1 | cond2 | cond3, ['tweet_id']].sort_values('tweet_id')

Complexity

  • ⏰ Time complexity: O(n * m) — n is the number of tweets, m is the average length of content (for counting characters).
  • 🧺 Space complexity: O(n) — For storing the result set.