+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| activity | enum |
| activity_date | date |
+---------------+---------+
This table may have duplicate rows.
The activity column is an ENUM (category) type of ('login', 'logout', 'jobs', 'groups', 'homepage').
Write a solution to reports for every date within at most 90 days from today, the number of users that logged in for the first time on that date.
Assume today is 2019-06-30.
Input:
Traffic table:+---------+----------+---------------+| user_id | activity | activity_date |+---------+----------+---------------+|1| login |2019-05-01||1| homepage |2019-05-01||1| logout |2019-05-01||2| login |2019-06-21||2| logout |2019-06-21||3| login |2019-01-01||3| jobs |2019-01-01||3| logout |2019-01-01||4| login |2019-06-21||4| groups |2019-06-21||4| logout |2019-06-21||5| login |2019-03-01||5| logout |2019-03-01||5| login |2019-06-21||5| logout |2019-06-21|+---------+----------+---------------+Output:
+------------+-------------+| login_date | user_count |+------------+-------------+|2019-05-01|1||2019-06-21|2|+------------+-------------+Explanation:
Note that we only care about dates with non zero user count.The user with id 5 first logged in on 2019-03-01 so he's not counted on 2019-06-21.
Select only ’login’ activities. For each user, find their earliest login date. Then, count users grouped by that date, restricting to dates within the last 90 days from ‘2019-06-30’.
SELECT first_login AS login_date, COUNT(*) AS user_count
FROM (
SELECT user_id, MIN(activity_date) AS first_login
FROM Traffic
WHERE activity ='login'GROUPBY user_id
) t
WHERE first_login BETWEEN DATE_SUB('2019-06-30', INTERVAL 89DAY) AND'2019-06-30'GROUPBY first_login;
1
2
3
4
5
# Assuming traffic is a pandas DataFramelogins = traffic[traffic['activity'] =='login']
first_login = logins.groupby('user_id')['activity_date'].min()
mask = (first_login >='2019-04-02') & (first_login <='2019-06-30')
result = first_login[mask].value_counts().sort_index()
1
2
String sql ="SELECT first_login AS login_date, COUNT(*) AS user_count FROM (SELECT user_id, MIN(activity_date) AS first_login FROM Traffic WHERE activity = 'login' GROUP BY user_id) t WHERE first_login BETWEEN DATE_SUB('2019-06-30', INTERVAL 89 DAY) AND '2019-06-30' GROUP BY first_login";
// Execute and fetch result