Problem

Table: Tweets

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| user_id     | int     |
| tweet_id    | int     |
| tweet_date  | date    |
| tweet       | varchar |
+-------------+---------+
tweet_id is the primary key (column with unique values) for this table.
Each row of this table contains user_id, tweet_id, tweet_date and tweet.
It is guaranteed that all tweet_date are valid dates in February 2024.

Write a solution to find the top 3 trending hashtags in February 2024. Every tweet may contain several hashtags.

Return the result table ordered by count of hashtag, hashtag indescending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Input:
Tweets table:
+---------+----------+------------------------------------------------------------+------------+
| user_id | tweet_id | tweet                                                      | tweet_date |
+---------+----------+------------------------------------------------------------+------------+
| 135     | 13       | Enjoying a great start to the day. #HappyDay #MorningVibes | 2024-02-01 |
| 136     | 14       | Another #HappyDay with good vibes! #FeelGood               | 2024-02-03 |
| 137     | 15       | Productivity peaks! #WorkLife #ProductiveDay               | 2024-02-04 |
| 138     | 16       | Exploring new tech frontiers. #TechLife #Innovation        | 2024-02-04 |
| 139     | 17       | Gratitude for today's moments. #HappyDay #Thankful         | 2024-02-05 |
| 140     | 18       | Innovation drives us. #TechLife #FutureTech                | 2024-02-07 |
| 141     | 19       | Connecting with nature's serenity. #Nature #Peaceful       | 2024-02-09 |
+---------+----------+------------------------------------------------------------+------------+
Output:
+-----------+-------+
| hashtag   | count |
+-----------+-------+
| #HappyDay | 3     |
| #TechLife | 2     |
| #WorkLife | 1     |
+-----------+-------+
Explanation:
* **#HappyDay:** Appeared in tweet IDs 13, 14, and 17, with a total count of 3 mentions.
* **#TechLife:** Appeared in tweet IDs 16 and 18, with a total count of 2 mentions.
* **#WorkLife:** Appeared in tweet ID 15, with a total count of 1 mention.
**Note:** Output table is sorted in descending order by count and hashtag
respectively.

Solution

Method 1 – Regex Extraction and Aggregation

Intuition

The main idea is to:

  • Extract all hashtags from each tweet (since tweets can have multiple hashtags).
  • Count the frequency of each hashtag across all tweets in February 2024.
  • Return the top 3 hashtags by count, breaking ties by hashtag lexicographically.

Approach

  1. Since all tweet_date values are in February 2024, no need to filter by date.
  2. Extract all hashtags from each tweet using regex.
  3. Unnest or split the hashtags so each hashtag in a tweet is counted separately.
  4. Group by hashtag and count occurrences.
  5. Sort by count (descending), then by hashtag (descending).
  6. Limit the result to the top 3.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
SELECT hashtag, COUNT(*) AS count
FROM (
  SELECT
    REGEXP_SUBSTR(tweet, '#\\w+', 1, n.n) AS hashtag
  FROM Tweets
  JOIN (
    SELECT a.N + b.N * 10 + 1 n
    FROM (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
          UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
    CROSS JOIN (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
          UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
  ) n
  WHERE REGEXP_SUBSTR(tweet, '#\\w+', 1, n.n) IS NOT NULL
) t
GROUP BY hashtag
ORDER BY count DESC, hashtag DESC
LIMIT 3;
1
2
3
4
5
6
7
8
SELECT hashtag, COUNT(*) AS count
FROM (
  SELECT unnest(regexp_matches(tweet, '#\\w+', 'g')) AS hashtag
  FROM Tweets
) t
GROUP BY hashtag
ORDER BY count DESC, hashtag DESC
LIMIT 3;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Solution:
    def find_trending_hashtags(self, Tweets: 'pd.DataFrame') -> 'pd.DataFrame':
        import pandas as pd
        import re
        all_tags = []
        for t in Tweets['tweet']:
            all_tags.extend(re.findall(r'#\\w+', t))
        ans = pd.Series(all_tags).value_counts().reset_index()
        ans.columns = ['hashtag', 'count']
        ans = ans.sort_values(['count', 'hashtag'], ascending=[False, False]).head(3)
        return ans[['hashtag', 'count']]

Complexity

  • ⏰ Time complexity: O(N*L), where N is the number of tweets and L is the average number of hashtags per tweet, since we scan each tweet and extract hashtags.
  • 🧺 Space complexity: O(K), where K is the number of unique hashtags.