Problem

Table Person:

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| id             | int     |
| name           | varchar |
| phone_number   | varchar |
+----------------+---------+
id is the column of unique values for this table.
Each row of this table contains the name of a person and their phone number.
Phone number will be in the form 'xxx-yyyyyyy' where xxx is the country code (3 characters) and yyyyyyy is the phone number (7 characters) where x and y are digits. Both can contain leading zeros.

Table Country:

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| name           | varchar |
| country_code   | varchar |
+----------------+---------+
country_code is the column of unique values for this table.
Each row of this table contains the country name and its code. country_code will be in the form 'xxx' where x is digits.

Table Calls:

+-------------+------+
| Column Name | Type |
+-------------+------+
| caller_id   | int  |
| callee_id   | int  |
| duration    | int  |
+-------------+------+
This table may contain duplicate rows.
Each row of this table contains the caller id, callee id and the duration of the call in minutes. caller_id != callee_id

A telecommunications company wants to invest in new countries. The company intends to invest in the countries where the average call duration of the calls in this country is strictly greater than the global average call duration.

Write a solution to find the countries where this company can invest.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Input: 
Person table:
+----+----------+--------------+
| id | name     | phone_number |
+----+----------+--------------+
| 3  | Jonathan | 051-1234567  |
| 12 | Elvis    | 051-7654321  |
| 1  | Moncef   | 212-1234567  |
| 2  | Maroua   | 212-6523651  |
| 7  | Meir     | 972-1234567  |
| 9  | Rachel   | 972-0011100  |
+----+----------+--------------+
Country table:
+----------+--------------+
| name     | country_code |
+----------+--------------+
| Peru     | 051          |
| Israel   | 972          |
| Morocco  | 212          |
| Germany  | 049          |
| Ethiopia | 251          |
+----------+--------------+
Calls table:
+-----------+-----------+----------+
| caller_id | callee_id | duration |
+-----------+-----------+----------+
| 1         | 9         | 33       |
| 2         | 9         | 4        |
| 1         | 2         | 59       |
| 3         | 12        | 102      |
| 3         | 12        | 330      |
| 12        | 3         | 5        |
| 7         | 9         | 13       |
| 7         | 1         | 3        |
| 9         | 7         | 1        |
| 1         | 7         | 7        |
+-----------+-----------+----------+
Output: 
+----------+
| country  |
+----------+
| Peru     |
+----------+
Explanation: 
The average call duration for Peru is (102 + 102 + 330 + 330 + 5 + 5) / 6 = 145.666667
The average call duration for Israel is (33 + 4 + 13 + 13 + 3 + 1 + 1 + 7) / 8 = 9.37500
The average call duration for Morocco is (33 + 4 + 59 + 59 + 3 + 7) / 6 = 27.5000 
Global call duration average = (2 * (33 + 4 + 59 + 102 + 330 + 5 + 13 + 3 + 1 + 7)) / 20 = 55.70000
Since Peru is the only country where the average call duration is greater than the global average, it is the only recommended country.

Solution

Method 1 – SQL Aggregation and Join

Intuition

The key idea is to compute the average call duration for each country and compare it to the global average. We join the Person, Country, and Calls tables to associate each call with a country, then aggregate durations by country and globally.

Approach

  1. Join Person and Country on the country code (extracted from the phone number).
  2. Join the result with Calls to associate each call with the caller’s country.
  3. Compute the average call duration for each country.
  4. Compute the global average call duration.
  5. Select countries where the country’s average is strictly greater than the global average.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT c.name AS country
FROM (
    SELECT co.name, AVG(ca.duration) AS avg_duration
    FROM Person p
    JOIN Country co ON LEFT(p.phone_number, 3) = co.country_code
    JOIN Calls ca ON ca.caller_id = p.id
    GROUP BY co.name
) c
JOIN (
    SELECT AVG(duration) AS global_avg
    FROM Calls
) g
ON 1=1
WHERE c.avg_duration > g.global_avg;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT c.name AS country
FROM (
    SELECT co.name, AVG(ca.duration) AS avg_duration
    FROM Person p
    JOIN Country co ON LEFT(p.phone_number, 3) = co.country_code
    JOIN Calls ca ON ca.caller_id = p.id
    GROUP BY co.name
) c
JOIN (
    SELECT AVG(duration) AS global_avg
    FROM Calls
) g ON TRUE
WHERE c.avg_duration > g.global_avg;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def countries_you_can_safely_invest_in(person: pd.DataFrame, country: pd.DataFrame, calls: pd.DataFrame) -> pd.DataFrame:
    person = person.copy()
    person['country_code'] = person['phone_number'].str[:3]
    merged = person.merge(country, left_on='country_code', right_on='country_code')
    merged = merged.merge(calls, left_on='id', right_on='caller_id')
    country_avg = merged.groupby('name_y')['duration'].mean()
    global_avg = calls['duration'].mean()
    res = country_avg[country_avg > global_avg].reset_index()
    res.columns = ['country', 'avg_duration']
    return res[['country']]

Complexity

  • ⏰ Time complexity: O(N), where N is the number of calls, as each call is processed once in the join and aggregation.
  • 🧺 Space complexity: O(C), where C is the number of countries, for storing the averages.