Problem

Table: Subscriptions

+-------------+------+
| Column Name | Type |
+-------------+------+
| account_id  | int  |
| start_date  | date |
| end_date    | date |
+-------------+------+
account_id is the primary key column for this table.
Each row of this table indicates the start and end dates of an account's subscription.
Note that always start_date < end_date.

Table: Streams

+-------------+------+
| Column Name | Type |
+-------------+------+
| session_id  | int  |
| account_id  | int  |
| stream_date | date |
+-------------+------+
session_id is the primary key column for this table.
account_id is a foreign key from the Subscriptions table.
Each row of this table contains information about the account and the date associated with a stream session.

Write an SQL query to report the number of accounts that bought a subscription in 2021 but did not have any stream session.

The query result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Input: 
Subscriptions table:
+------------+------------+------------+
| account_id | start_date | end_date   |
+------------+------------+------------+
| 9          | 2020-02-18 | 2021-10-30 |
| 3          | 2021-09-21 | 2021-11-13 |
| 11         | 2020-02-28 | 2020-08-18 |
| 13         | 2021-04-20 | 2021-09-22 |
| 4          | 2020-10-26 | 2021-05-08 |
| 5          | 2020-09-11 | 2021-01-17 |
+------------+------------+------------+
Streams table:
+------------+------------+-------------+
| session_id | account_id | stream_date |
+------------+------------+-------------+
| 14         | 9          | 2020-05-16  |
| 16         | 3          | 2021-10-27  |
| 18         | 11         | 2020-04-29  |
| 17         | 13         | 2021-08-08  |
| 19         | 4          | 2020-12-31  |
| 13         | 5          | 2021-01-05  |
+------------+------------+-------------+
Output: 
+----------------+
| accounts_count |
+----------------+
| 2              |
+----------------+
Explanation: Users 4 and 9 did not stream in 2021.
User 11 did not subscribe in 2021.

Solution

Method 1 -

Intuition

We need to count accounts that started a subscription in 2021 but never streamed during their subscription. This is a classic anti-join problem: find accounts in Subscriptions with a 2021 start_date and no matching Streams.

Approach

Select accounts from Subscriptions where YEAR(start_date) = 2021, and use a LEFT JOIN to Streams on account_id. Filter for accounts where Streams.account_id IS NULL (i.e., no stream session exists for that account).

Code

1
2
3
4
5
6
SELECT COUNT(*) AS accounts_count
FROM Subscriptions s
LEFT JOIN Streams t
  ON s.account_id = t.account_id
WHERE YEAR(s.start_date) = 2021
  AND t.account_id IS NULL;
1
2
3
4
# Assuming subscriptions and streams are pandas DataFrames
subs_2021 = subscriptions[subscriptions['start_date'].dt.year == 2021]
merged = subs_2021.merge(streams, on='account_id', how='left', indicator=True)
accounts_count = (merged['_merge'] == 'left_only').sum()
1
2
String sql = "SELECT COUNT(*) AS accounts_count FROM Subscriptions s LEFT JOIN Streams t ON s.account_id = t.account_id WHERE YEAR(s.start_date) = 2021 AND t.account_id IS NULL";
// Execute and fetch result

Complexity

  • ⏰ Time complexity: O(N + M) where N = number of subscriptions, M = number of streams (due to join and filter).
  • 🧺 Space complexity: O(1) for SQL, O(N) for in-memory approaches.