+-------------+---------+
| Column Name | Type |
+-------------+---------+
| user_id | int |
| tweet_id | int |
| tweet_date | date |
| tweet | varchar |
+-------------+---------+
tweet_id is the primary key (column with unique values) for this table.
Each row of this table contains user_id, tweet_id, tweet_date and tweet.
It is guaranteed that all tweet_date are valid dates in February 2024.
Write a solution to find the top3 trending hashtags in February2024. Every tweet may contain severalhashtags.
Return the result table ordered by count of hashtag, hashtag indescendingorder.
Input:
Tweets table:+---------+----------+------------------------------------------------------------+------------+| user_id | tweet_id | tweet | tweet_date |+---------+----------+------------------------------------------------------------+------------+|135|13| Enjoying a great start to the day.#HappyDay #MorningVibes |2024-02-01||136|14| Another #HappyDay with good vibes!#FeelGood |2024-02-03||137|15| Productivity peaks!#WorkLife #ProductiveDay |2024-02-04||138|16| Exploring new tech frontiers.#TechLife #Innovation |2024-02-04||139|17| Gratitude for today's moments.#HappyDay #Thankful |2024-02-05||140|18| Innovation drives us.#TechLife #FutureTech |2024-02-07||141|19| Connecting with nature's serenity.#Nature #Peaceful |2024-02-09|+---------+----------+------------------------------------------------------------+------------+Output:
+-----------+-------+| hashtag | count |+-----------+-------+|#HappyDay |3||#TechLife |2||#WorkLife |1|+-----------+-------+Explanation:
***#HappyDay:** Appeared in tweet IDs 13,14, and 17,with a total count of 3 mentions.***#TechLife:** Appeared in tweet IDs 16 and 18,with a total count of 2 mentions.***#WorkLife:** Appeared in tweet ID 15,with a total count of 1 mention.**Note:** Output table is sorted in descending order by count and hashtag
respectively.
SELECT hashtag, COUNT(*) AScountFROM (
SELECT REGEXP_SUBSTR(tweet, '#\\w+', 1, n.n) AS hashtag
FROM Tweets
JOIN (
SELECT a.N + b.N *10+1 n
FROM (SELECT0AS N UNIONALLSELECT1UNIONALLSELECT2UNIONALLSELECT3UNIONALLSELECT4UNIONALLSELECT5UNIONALLSELECT6UNIONALLSELECT7UNIONALLSELECT8UNIONALLSELECT9) a
CROSSJOIN (SELECT0AS N UNIONALLSELECT1UNIONALLSELECT2UNIONALLSELECT3UNIONALLSELECT4UNIONALLSELECT5UNIONALLSELECT6UNIONALLSELECT7UNIONALLSELECT8UNIONALLSELECT9) b
) n
WHERE REGEXP_SUBSTR(tweet, '#\\w+', 1, n.n) ISNOTNULL) t
GROUPBY hashtag
ORDERBYcountDESC, hashtag DESCLIMIT3;
1
2
3
4
5
6
7
8
SELECT hashtag, COUNT(*) AScountFROM (
SELECTunnest(regexp_matches(tweet, '#\\w+', 'g')) AS hashtag
FROM Tweets
) t
GROUPBY hashtag
ORDERBYcountDESC, hashtag DESCLIMIT3;
1
2
3
4
5
6
7
8
9
10
11
classSolution:
deffind_trending_hashtags(self, Tweets: 'pd.DataFrame') ->'pd.DataFrame':
import pandas as pd
import re
all_tags = []
for t in Tweets['tweet']:
all_tags.extend(re.findall(r'#\\w+', t))
ans = pd.Series(all_tags).value_counts().reset_index()
ans.columns = ['hashtag', 'count']
ans = ans.sort_values(['count', 'hashtag'], ascending=[False, False]).head(3)
return ans[['hashtag', 'count']]
⏰ Time complexity: O(N*L), where N is the number of tweets and L is the average number of hashtags per tweet, since we scan each tweet and extract hashtags.
🧺 Space complexity: O(K), where K is the number of unique hashtags.