Problem

Table: SurveyLog

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| action      | ENUM |
| question_id | int  |
| answer_id   | int  |
| q_num       | int  |
| timestamp   | int  |
+-------------+------+
This table may contain duplicate rows.
action is an ENUM (category) of the type: "show", "answer", or "skip".
Each row of this table indicates the user with ID = id has taken an action with the question question_id at time timestamp.
If the action taken by the user is "answer", answer_id will contain the id of that answer, otherwise, it will be null.
q_num is the numeral order of the question in the current session.

The answer rate for a question is the number of times a user answered the question by the number of times a user showed the question.

Write a solution to report the question that has the highest answer rate. If multiple questions have the same maximum answer rate , report the question with the smallest question_id.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
Input: 
SurveyLog table:
+----+--------+-------------+-----------+-------+-----------+
| id | action | question_id | answer_id | q_num | timestamp |
+----+--------+-------------+-----------+-------+-----------+
| 5  | show   | 285         | null      | 1     | 123       |
| 5  | answer | 285         | 124124    | 1     | 124       |
| 5  | show   | 369         | null      | 2     | 125       |
| 5  | skip   | 369         | null      | 2     | 126       |
+----+--------+-------------+-----------+-------+-----------+
Output: 
+------------+
| survey_log |
+------------+
| 285        |
+------------+
Explanation: 
Question 285 was showed 1 time and answered 1 time. The answer rate of question 285 is 1.0
Question 369 was showed 1 time and was not answered. The answer rate of question 369 is 0.0
Question 285 has the highest answer rate.

Solution

Method 1 – Aggregation and Division for Answer Rate

Intuition

To find the question with the highest answer rate, we need to count the number of times each question was shown and answered, then compute the answer rate as answers/shows. The question with the highest rate is the answer.

Approach

  1. For each question, count the number of ‘show’ actions and ‘answer’ actions.
  2. Compute the answer rate for each question as answers/shows.
  3. Return the question_id with the highest answer rate. If there are ties, return the smallest question_id.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT question_id
FROM (
  SELECT question_id,
         SUM(action = 'show') AS shows,
         SUM(action = 'answer') AS answers,
         SUM(action = 'answer') / SUM(action = 'show') AS rate
  FROM SurveyLog
  GROUP BY question_id
) t
ORDER BY rate DESC, question_id ASC
LIMIT 1;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT question_id
FROM (
  SELECT question_id,
         SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) AS shows,
         SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END) AS answers,
         SUM(CASE WHEN action = 'answer' THEN 1.0 ELSE 0 END) / NULLIF(SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END), 0) AS rate
  FROM SurveyLog
  GROUP BY question_id
) t
ORDER BY rate DESC, question_id ASC
LIMIT 1;
1
2
3
4
5
6
7
8
class Solution:
    def get_highest_answer_rate_question(self, surveylog: 'pd.DataFrame') -> int:
        import pandas as pd
        shows = surveylog[surveylog['action'] == 'show'].groupby('question_id').size()
        answers = surveylog[surveylog['action'] == 'answer'].groupby('question_id').size()
        rate = (answers / shows).fillna(0)
        max_rate = rate.max()
        return rate[rate == max_rate].index.min()

Complexity

  • ⏰ Time complexity: O(n), where n is the number of rows in SurveyLog, since each row is processed once.
  • 🧺 Space complexity: O(m), where m is the number of unique questions, for storing counts and rates.