Problem
Table: Tweets
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| tweet_id | int |
| content | varchar |
+----------------+---------+
tweet_id
is the primary key (column with unique values) for this table.
This table contains all the tweets in a social media app.
Write a solution to find the IDs of the invalid tweets. The tweet is invalid if the number of characters used in the content of the tweet is strictly greater than 15
.
Return the result table in any order.
The result format is in the following example.
Examples
Example 1:
Input:
Tweets table:
+----------+----------------------------------+
| tweet_id | content |
+----------+----------------------------------+
| 1 | Vote for Biden |
| 2 | Let us make America great again! |
+----------+----------------------------------+
Output:
+----------+
| tweet_id |
+----------+
| 2 |
+----------+
Explanation: Tweet 1 has length = 14. It is a valid tweet. Tweet 2 has length = 32. It is an invalid tweet.
Solution
Method 1 - Using char_length
(and Not length
)
Code
SQL
Using LENGTH()
is incorrect. More: MYSQL String length vs char_length.
The question is asking for the number of characters used in the content
. LENGTH()
returns the length of the string measured in bytes. CHAR_LENGTH()
returns the length of the string measured in characters.
SELECT tweet_id
FROM Tweets
WHERE CHAR_LENGTH(content) > 15
Pandas
import pandas as pd
def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
# Filter rows where the length of 'content' is strictly greater than 15
invalid_tweets_df = tweets[tweets['content'].str.len() > 15]
# Select only the 'tweet_id' column from the invalid tweets DataFrame
result_df = invalid_tweets_df[['tweet_id']]
return result_df