Problem

Table: Tweets

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| user_id     | int     |
| tweet_id    | int     |
| tweet_date  | date    |
| tweet       | varchar |
+-------------+---------+
tweet_id is the primary key (column with unique values) for this table.
Each row of this table contains user_id, tweet_id, tweet_date and tweet.

Write a solution to find the top 3 trending hashtags in February 2024. Each tweet only contains one hashtag.

Return the result table orderd by count of hashtag, hashtag indescending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Input:
Tweets table:
+---------+----------+----------------------------------------------+------------+
| user_id | tweet_id | tweet                                        | tweet_date |
+---------+----------+----------------------------------------------+------------+
| 135     | 13       | Enjoying a great start to the day! #HappyDay | 2024-02-01 |
| 136     | 14       | Another #HappyDay with good vibes!           | 2024-02-03 |
| 137     | 15       | Productivity peaks! #WorkLife                | 2024-02-04 |
| 138     | 16       | Exploring new tech frontiers. #TechLife      | 2024-02-04 |
| 139     | 17       | Gratitude for today's moments. #HappyDay     | 2024-02-05 |
| 140     | 18       | Innovation drives us. #TechLife              | 2024-02-07 |
| 141     | 19       | Connecting with nature's serenity. #Nature   | 2024-02-09 |
+---------+----------+----------------------------------------------+------------+
Output:
+-----------+--------------+
| hashtag   | hashtag_count|
+-----------+--------------+
| #HappyDay | 3            |
| #TechLife | 2            |
| #WorkLife | 1            |
+-----------+--------------+
Explanation:
* **#HappyDay:** Appeared in tweet IDs 13, 14, and 17, with a total count of 3 mentions.
* **#TechLife:** Appeared in tweet IDs 16 and 18, with a total count of 2 mentions.
* **#WorkLife:** Appeared in tweet ID 15, with a total count of 1 mention.
**Note:** Output table is sorted in descending order by hashtag_count and
hashtag respectively.

Solution

Method 1 – String Extraction and Aggregation

Intuition

The main idea is to:

  • Filter tweets from February 2024.
  • Extract the hashtag from each tweet (since each tweet contains only one hashtag).
  • Count the frequency of each hashtag.
  • Return the top 3 hashtags by count, breaking ties by hashtag lexicographically.

Approach

  1. Filter the Tweets table for rows where tweet_date is in February 2024.
  2. Extract the hashtag from the tweet string.
    • Since each tweet contains only one hashtag, we can use string functions to find and extract it.
  3. Group by the extracted hashtag and count occurrences.
  4. Sort by count (descending), then by hashtag (descending).
  5. Limit the result to the top 3.

Code

1
2
3
4
5
6
7
8
SELECT
  SUBSTRING(tweet, LOCATE('#', tweet)) AS hashtag,
  COUNT(*) AS hashtag_count
FROM Tweets
WHERE tweet_date BETWEEN '2024-02-01' AND '2024-02-29'
GROUP BY hashtag
ORDER BY hashtag_count DESC, hashtag DESC
LIMIT 3;
1
2
3
4
5
6
7
8
SELECT
  SUBSTRING(tweet FROM '#\\w+') AS hashtag,
  COUNT(*) AS hashtag_count
FROM Tweets
WHERE tweet_date BETWEEN '2024-02-01' AND '2024-02-29'
GROUP BY hashtag
ORDER BY hashtag_count DESC, hashtag DESC
LIMIT 3;
1
2
3
4
5
6
7
8
class Solution:
    def find_trending_hashtags(self, Tweets: 'pd.DataFrame') -> 'pd.DataFrame':
        mask = (Tweets['tweet_date'] >= '2024-02-01') & (Tweets['tweet_date'] <= '2024-02-29')
        feb = Tweets[mask].copy()
        feb['hashtag'] = feb['tweet'].str.extract(r'(#\\w+)')
        ans = feb.groupby('hashtag').size().reset_index(name='hashtag_count')
        ans = ans.sort_values(['hashtag_count', 'hashtag'], ascending=[False, False]).head(3)
        return ans[['hashtag', 'hashtag_count']]

Complexity

  • ⏰ Time complexity: O(N), where N is the number of tweets, since we scan each tweet once and group by hashtag.
  • 🧺 Space complexity: O(K), where K is the number of unique hashtags in February 2024.