Problem

Table: Subscriptions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
+-------------+------+
| Column Name | Type |
+-------------+------+
| account_id  | int  |
| start_date  | date |
| end_date    | date |
+-------------+------+
account_id is the primary key column for this table.
Each row of this table indicates the start and end dates of an account's subscription.
Note that always start_date < end_date.

Table: Streams

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
+-------------+------+
| Column Name | Type |
+-------------+------+
| session_id  | int  |
| account_id  | int  |
| stream_date | date |
+-------------+------+
session_id is the primary key column for this table.
account_id is a foreign key from the Subscriptions table.
Each row of this table contains information about the account and the date associated with a stream session.

Write an SQL query to report the number of accounts that bought a subscription in 2021 but did not have any stream session.

The query result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Input: 
Subscriptions table:
+------------+------------+------------+
| account_id | start_date | end_date   |
+------------+------------+------------+
| 9          | 2020-02-18 | 2021-10-30 |
| 3          | 2021-09-21 | 2021-11-13 |
| 11         | 2020-02-28 | 2020-08-18 |
| 13         | 2021-04-20 | 2021-09-22 |
| 4          | 2020-10-26 | 2021-05-08 |
| 5          | 2020-09-11 | 2021-01-17 |
+------------+------------+------------+
Streams table:
+------------+------------+-------------+
| session_id | account_id | stream_date |
+------------+------------+-------------+
| 14         | 9          | 2020-05-16  |
| 16         | 3          | 2021-10-27  |
| 18         | 11         | 2020-04-29  |
| 17         | 13         | 2021-08-08  |
| 19         | 4          | 2020-12-31  |
| 13         | 5          | 2021-01-05  |
+------------+------------+-------------+
Output: 
+----------------+
| accounts_count |
+----------------+
| 2              |
+----------------+
Explanation: Users 4 and 9 did not stream in 2021.
User 11 did not subscribe in 2021.

Solution

Method 1 - Active-in-2021 Anti-Join (NOT EXISTS / LEFT JOIN)

Intuition

We want accounts whose subscription period overlaps year 2021 (i.e., start_date <= '2021-12-31' and end_date >= '2021-01-01') but that have no streaming records in 2021. This is an anti-join: pick subscriptions active in 2021 and exclude those with any Streams in the 2021 range.

Approach

  1. Filter Subscriptions to accounts whose subscription overlaps 2021.
  2. For each such account check whether there exists any Streams row with stream_date in 2021.
  3. Count accounts for which no such stream exists. This can be written with NOT EXISTS (clear and index-friendly) or with a LEFT JOIN to a filtered Streams table and checking for NULL.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT COUNT(*) AS accounts_count
FROM Subscriptions s
WHERE s.start_date <= '2021-12-31'
  AND s.end_date >= '2021-01-01'
  AND NOT EXISTS (
    SELECT 1
    FROM Streams t
    WHERE t.account_id = s.account_id
      AND t.stream_date BETWEEN '2021-01-01' AND '2021-12-31'
  );
1
2
3
4
5
6
7
8
SELECT COUNT(*) AS accounts_count
FROM Subscriptions s
LEFT JOIN Streams t
  ON s.account_id = t.account_id
  AND t.stream_date BETWEEN '2021-01-01' AND '2021-12-31'
WHERE s.start_date <= '2021-12-31'
  AND s.end_date >= '2021-01-01'
  AND t.account_id IS NULL;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import pandas as pd

def count_accounts_no_streams_2021(subscriptions: pd.DataFrame, streams: pd.DataFrame) -> int:
    """Return the number of accounts with subscriptions overlapping 2021 but no streams in 2021.

    Assumes date columns are datetime-like or ISO strings.
    """
    subs = subscriptions.copy()
    strm = streams.copy()
    subs['start_date'] = pd.to_datetime(subs['start_date'])
    subs['end_date'] = pd.to_datetime(subs['end_date'])
    strm['stream_date'] = pd.to_datetime(strm['stream_date'])

    # Subscriptions that are active at any point in 2021
    mask_subs_2021 = (subs['start_date'] <= pd.to_datetime('2021-12-31')) & (
        subs['end_date'] >= pd.to_datetime('2021-01-01'))
    subs_2021 = subs.loc[mask_subs_2021]

    # Accounts that streamed in 2021
    strm_2021 = strm[(strm['stream_date'] >= '2021-01-01') & (strm['stream_date'] <= '2021-12-31')]
    streamed_accounts = set(strm_2021['account_id'].unique())

    # Count accounts in subs_2021 that are not in streamed_accounts
    result = subs_2021[~subs_2021['account_id'].isin(streamed_accounts)]
    return int(result['account_id'].nunique())

Complexity

  • Time complexity: O(N + M) – We scan Subscriptions and Streams once (filters and existence checks) where N = number of subscriptions and M = number of stream rows.
  • 🧺 Space complexity: O(1) – The SQL solutions use constant extra space for aggregation; in-memory pandas may use extra space proportional to filtered rows.