+---------------+---------+
|Column Name |Type|+---------------+---------+
| user_id | int || activity | enum || activity_date | date |+---------------+---------+
This table may have duplicate rows.
The activity columnis an ENUM (category) typeof ('login', 'logout', 'jobs', 'groups', 'homepage').
Write a solution to reports for every date within at most 90 days from today, the number of users that logged in for the first time on that date.
Assume today is 2019-06-30.
Input:
Traffic table:+---------+----------+---------------+| user_id | activity | activity_date |+---------+----------+---------------+|1| login |2019-05-01||1| homepage |2019-05-01||1| logout |2019-05-01||2| login |2019-06-21||2| logout |2019-06-21||3| login |2019-01-01||3| jobs |2019-01-01||3| logout |2019-01-01||4| login |2019-06-21||4| groups |2019-06-21||4| logout |2019-06-21||5| login |2019-03-01||5| logout |2019-03-01||5| login |2019-06-21||5| logout |2019-06-21|+---------+----------+---------------+Output:
+------------+-------------+| login_date | user_count |+------------+-------------+|2019-05-01|1||2019-06-21|2|+------------+-------------+Explanation:
Note that we only care about dates with non zero user count.The user with id 5 first logged in on 2019-03-01 so he's not counted on 2019-06-21.
For each user we only care about their earliest 'login' date. Once we compute that first-login date per user, we count how many users have that date within the last 90 days from 2019-06-30.
SELECT first_login AS login_date, COUNT(*) AS user_count
FROM (
SELECT user_id, MIN(activity_date) AS first_login
FROM Traffic
WHERE activity ='login'GROUPBY user_id
) t
WHERE first_login BETWEEN DATE_SUB('2019-06-30', INTERVAL 89DAY) AND'2019-06-30'GROUPBY first_login;
1
2
3
4
5
6
7
8
9
10
-- PostgreSQL equivalent (use INTERVAL arithmetic)
SELECT first_login AS login_date, COUNT(*) AS user_count
FROM (
SELECT user_id, MIN(activity_date) AS first_login
FROM Traffic
WHERE activity ='login'GROUPBY user_id
) t
WHERE first_login BETWEEN (DATE '2019-06-30'- INTERVAL '89 days') AND DATE '2019-06-30'GROUPBY first_login;
import pandas as pd
deffirst_login_counts(traffic: pd.DataFrame) -> pd.DataFrame:
"""Return DataFrame with columns ['login_date','user_count'] for first logins in the 90-day window.
Assumes traffic['activity_date'] is a datetime-like column or string in ISO format.
"""# Ensure activity_date is datetime df = traffic.copy()
df['activity_date'] = pd.to_datetime(df['activity_date'])
# Filter to login activities and compute first login per user logins = df[df['activity'] =='login']
first_login = logins.groupby('user_id', as_index=False)['activity_date'].min()
first_login.rename(columns={'activity_date': 'first_login'}, inplace=True)
# Define window (inclusive) end = pd.to_datetime('2019-06-30')
start = end - pd.Timedelta(days=89)
# Filter to the 90-day window and count by date mask = (first_login['first_login'] >= start) & (first_login['first_login'] <= end)
window = first_login.loc[mask].copy()
window['login_date'] = window['first_login'].dt.date
result = (
window.groupby('login_date')
.size()
.reset_index(name='user_count')
.sort_values('login_date')
.reset_index(drop=True)
)
return result