+---------------+------+
| Column Name | Type |
+---------------+------+
| purchase_id | int |
| user_id | int |
| purchase_date | date |
+---------------+------+
purchase_id contains unique values.
This table contains logs of the dates that users purchased from a certain retailer.
Write a solution to report the IDs of the users that made any two purchases
at most7 days apart.
Input:
Purchases table:+-------------+---------+---------------+| purchase_id | user_id | purchase_date |+-------------+---------+---------------+|4|2|2022-03-13||1|5|2022-02-11||3|7|2022-06-19||6|2|2022-03-20||5|7|2022-06-19||2|2|2022-06-08|+-------------+---------+---------------+Output:
+---------+| user_id |+---------+|2||7|+---------+Explanation:
User 2 had two purchases on 2022-03-13 and 2022-03-20. Since the second purchase is within 7 days of the first purchase, we add their ID.User 5 had only 1 purchase.User 7 had two purchases on the same day so we add their ID.
We want to find users who have at least two purchases within 7 days. We can do this by joining the table to itself or using window functions to compare each purchase with others for the same user.
SELECTDISTINCT p1.user_id
FROM Purchases p1
JOIN Purchases p2
ON p1.user_id = p2.user_id
AND p1.purchase_id <> p2.purchase_id
WHEREABS(DATEDIFF(p1.purchase_date, p2.purchase_date)) <=7ORDERBY p1.user_id;
1
2
3
4
5
6
7
SELECTDISTINCT p1.user_id
FROM Purchases p1
JOIN Purchases p2
ON p1.user_id = p2.user_id
AND p1.purchase_id <> p2.purchase_id
WHEREABS(p1.purchase_date - p2.purchase_date) <= INTERVAL '7 days'ORDERBY p1.user_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
defusers_with_two_purchases_within_seven_days(purchases: pd.DataFrame) -> pd.DataFrame:
purchases = purchases.sort_values(['user_id', 'purchase_date'])
result = set()
for user_id, group in purchases.groupby('user_id'):
dates = group['purchase_date'].tolist()
for i in range(len(dates)):
for j in range(i+1, len(dates)):
if abs((dates[j] - dates[i]).days) <=7:
result.add(user_id)
breakif user_id in result:
breakreturn pd.DataFrame({'user_id': sorted(result)})