Problem

Table: Visits

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| visit_date    | date    |
+---------------+---------+
(user_id, visit_date) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates that user_id has visited the bank in visit_date.

Table: Transactions

+------------------+---------+
| Column Name      | Type    |
+------------------+---------+
| user_id          | int     |
| transaction_date | date    |
| amount           | int     |
+------------------+---------+
This table may contain duplicates rows.
Each row of this table indicates that user_id has done a transaction of amount in transaction_date.
It is guaranteed that the user has visited the bank in the transaction_date.(i.e The Visits table contains (user_id, transaction_date) in one row)

A bank wants to draw a chart of the number of transactions bank visitors did in one visit to the bank and the corresponding number of visitors who have done this number of transaction in one visit.

Write a solution to find how many users visited the bank and didn’t do any transactions, how many visited the bank and did one transaction, and so on.

The result table will contain two columns:

  • transactions_count which is the number of transactions done in one visit.
  • visits_count which is the corresponding number of users who did transactions_count in one visit to the bank.

transactions_count should take all values from 0 to max(transactions_count) done by one or more users.

Return the result table ordered by transactions_count.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
![](https://fastly.jsdelivr.net/gh/doocs/leetcode@main/solution/1300-1399/1336.Number%20of%20Transactions%20per%20Visit/images/chart.png)
Input: 
Visits table:
+---------+------------+
| user_id | visit_date |
+---------+------------+
| 1       | 2020-01-01 |
| 2       | 2020-01-02 |
| 12      | 2020-01-01 |
| 19      | 2020-01-03 |
| 1       | 2020-01-02 |
| 2       | 2020-01-03 |
| 1       | 2020-01-04 |
| 7       | 2020-01-11 |
| 9       | 2020-01-25 |
| 8       | 2020-01-28 |
+---------+------------+
Transactions table:
+---------+------------------+--------+
| user_id | transaction_date | amount |
+---------+------------------+--------+
| 1       | 2020-01-02       | 120    |
| 2       | 2020-01-03       | 22     |
| 7       | 2020-01-11       | 232    |
| 1       | 2020-01-04       | 7      |
| 9       | 2020-01-25       | 33     |
| 9       | 2020-01-25       | 66     |
| 8       | 2020-01-28       | 1      |
| 9       | 2020-01-25       | 99     |
+---------+------------------+--------+
Output: 
+--------------------+--------------+
| transactions_count | visits_count |
+--------------------+--------------+
| 0                  | 4            |
| 1                  | 5            |
| 2                  | 0            |
| 3                  | 1            |
+--------------------+--------------+
Explanation: The chart drawn for this example is shown above.
* For transactions_count = 0, The visits (1, "2020-01-01"), (2, "2020-01-02"), (12, "2020-01-01") and (19, "2020-01-03") did no transactions so visits_count = 4.
* For transactions_count = 1, The visits (2, "2020-01-03"), (7, "2020-01-11"), (8, "2020-01-28"), (1, "2020-01-02") and (1, "2020-01-04") did one transaction so visits_count = 5.
* For transactions_count = 2, No customers visited the bank and did two transactions so visits_count = 0.
* For transactions_count = 3, The visit (9, "2020-01-25") did three transactions so visits_count = 1.
* For transactions_count >= 4, No customers visited the bank and did more than three transactions so we will stop at transactions_count = 3

Solution

Method 1 – Group By and Count

Intuition

For each visit, count the number of transactions done on that visit. Then, for each possible transaction count (from 0 to max), count how many visits had that number of transactions.

Approach

  1. Left join Visits with Transactions on (user_id, visit_date = transaction_date).
  2. Group by (user_id, visit_date) and count the number of transactions per visit.
  3. For each possible transactions_count from 0 to max, count how many visits had that count.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
WITH visit_txn AS (
  SELECT v.user_id, v.visit_date, COUNT(t.amount) AS transactions_count
  FROM Visits v
  LEFT JOIN Transactions t
    ON v.user_id = t.user_id AND v.visit_date = t.transaction_date
  GROUP BY v.user_id, v.visit_date
),
max_txn AS (
  SELECT MAX(transactions_count) AS mx FROM visit_txn
)
SELECT n AS transactions_count,
       COUNT(*) AS visits_count
FROM (
  SELECT 0 AS n UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10
  -- add more if needed
) nums
JOIN max_txn ON nums.n <= max_txn.mx
LEFT JOIN visit_txn vt ON vt.transactions_count = nums.n
GROUP BY nums.n
ORDER BY nums.n;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
WITH visit_txn AS (
  SELECT v.user_id, v.visit_date, COUNT(t.amount) AS transactions_count
  FROM Visits v
  LEFT JOIN Transactions t
    ON v.user_id = t.user_id AND v.visit_date = t.transaction_date
  GROUP BY v.user_id, v.visit_date
),
max_txn AS (
  SELECT MAX(transactions_count) AS mx FROM visit_txn
),
nums AS (
  SELECT generate_series(0, (SELECT mx FROM max_txn)) AS n
)
SELECT nums.n AS transactions_count, COUNT(*) AS visits_count
FROM nums
LEFT JOIN visit_txn vt ON vt.transactions_count = nums.n
GROUP BY nums.n
ORDER BY nums.n;
1
2
3
4
5
6
7
8
9
import pandas as pd
def number_of_transactions_per_visit(visits, transactions):
    merged = visits.merge(transactions, left_on=['user_id', 'visit_date'], right_on=['user_id', 'transaction_date'], how='left')
    txn_count = merged.groupby(['user_id', 'visit_date'])['amount'].count().reset_index(name='transactions_count')
    max_txn = txn_count['transactions_count'].max()
    result = pd.DataFrame({'transactions_count': range(0, max_txn+1)})
    visits_count = txn_count['transactions_count'].value_counts().reindex(result['transactions_count'], fill_value=0).values
    result['visits_count'] = visits_count
    return result

Complexity

  • ⏰ Time complexity: O(N + M) where N = #visits, M = #transactions
  • 🧺 Space complexity: O(N + M)