Problem

Table: Purchases

+---------------+------+
| Column Name   | Type |
+---------------+------+
| purchase_id   | int  |
| user_id       | int  |
| purchase_date | date |
+---------------+------+
purchase_id contains unique values.
This table contains logs of the dates that users purchased from a certain retailer.

Write a solution to report the IDs of the users that made any two purchases at most 7 days apart.

Return the result table ordered by user_id.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Input: 
Purchases table:
+-------------+---------+---------------+
| purchase_id | user_id | purchase_date |
+-------------+---------+---------------+
| 4           | 2       | 2022-03-13    |
| 1           | 5       | 2022-02-11    |
| 3           | 7       | 2022-06-19    |
| 6           | 2       | 2022-03-20    |
| 5           | 7       | 2022-06-19    |
| 2           | 2       | 2022-06-08    |
+-------------+---------+---------------+
Output: 
+---------+
| user_id |
+---------+
| 2       |
| 7       |
+---------+
Explanation: 
User 2 had two purchases on 2022-03-13 and 2022-03-20. Since the second purchase is within 7 days of the first purchase, we add their ID.
User 5 had only 1 purchase.
User 7 had two purchases on the same day so we add their ID.

Solution

Method 1 – Self Join or Window Functions

Intuition

We want to find users who have at least two purchases within 7 days. We can do this by joining the table to itself or using window functions to compare each purchase with others for the same user.

Approach

  1. For each user, compare every pair of purchases and check if the date difference is at most 7 days.
  2. Return the user IDs that satisfy this condition, ordered by user_id.

Code

1
2
3
4
5
6
7
SELECT DISTINCT p1.user_id
FROM Purchases p1
JOIN Purchases p2
  ON p1.user_id = p2.user_id
 AND p1.purchase_id <> p2.purchase_id
WHERE ABS(DATEDIFF(p1.purchase_date, p2.purchase_date)) <= 7
ORDER BY p1.user_id;
1
2
3
4
5
6
7
SELECT DISTINCT p1.user_id
FROM Purchases p1
JOIN Purchases p2
  ON p1.user_id = p2.user_id
 AND p1.purchase_id <> p2.purchase_id
WHERE ABS(p1.purchase_date - p2.purchase_date) <= INTERVAL '7 days'
ORDER BY p1.user_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import pandas as pd
def users_with_two_purchases_within_seven_days(purchases: pd.DataFrame) -> pd.DataFrame:
    purchases = purchases.sort_values(['user_id', 'purchase_date'])
    result = set()
    for user_id, group in purchases.groupby('user_id'):
        dates = group['purchase_date'].tolist()
        for i in range(len(dates)):
            for j in range(i+1, len(dates)):
                if abs((dates[j] - dates[i]).days) <= 7:
                    result.add(user_id)
                    break
            if user_id in result:
                break
    return pd.DataFrame({'user_id': sorted(result)})

Complexity

  • ⏰ Time complexity: O(N^2) per user in the worst case, but usually much less if purchases are sparse.
  • 🧺 Space complexity: O(U) for the result, where U is the number of users.