Problem

Table: Tweets

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| tweet_id       | int     |
| content        | varchar |
+----------------+---------+

tweet_id is the primary key (column with unique values) for this table. This table contains all the tweets in a social media app.

Write a solution to find the IDs of the invalid tweets. The tweet is invalid if the number of characters used in the content of the tweet is strictly greater than 15.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

Input:

Tweets table:
+----------+----------------------------------+
| tweet_id | content                          |
+----------+----------------------------------+
| 1        | Vote for Biden                   |
| 2        | Let us make America great again! |
+----------+----------------------------------+

Output:

+----------+
| tweet_id |
+----------+
| 2        |
+----------+

Explanation: Tweet 1 has length = 14. It is a valid tweet. Tweet 2 has length = 32. It is an invalid tweet.

Solution

Method 1 - Using char_length (and Not length)

Code

SQL

Using LENGTH() is incorrect. More: MYSQL String length vs char_length.

The question is asking for the number of characters used in the contentLENGTH() returns the length of the string measured in bytes. CHAR_LENGTH() returns the length of the string measured in characters.

SELECT tweet_id
FROM Tweets
WHERE CHAR_LENGTH(content) > 15
Pandas
import pandas as pd

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    # Filter rows where the length of 'content' is strictly greater than 15
    invalid_tweets_df = tweets[tweets['content'].str.len() > 15]
    
    # Select only the 'tweet_id' column from the invalid tweets DataFrame
    result_df = invalid_tweets_df[['tweet_id']]
    
    return result_df