Problem

Table: cities

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| state       | varchar |
| city        | varchar |
+-------------+---------+
(state, city) is the combination of columns with unique values for this table.
Each row of this table contains the state name and the city name within that state.

Write a solution to find all the cities in each state and analyze them based on the following requirements:

  • Combine all cities into a comma-separated string for each state.
  • Only include states that have at least 3 cities.
  • Only include states where at least one city starts with the same letter as the state name.

Return the result table ordered by the count of matching-letter cities indescending order and then by state name inascending order.

The result format is in the following example.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Input:
cities table:
+--------------+---------------+
| state        | city          |
+--------------+---------------+
| New York     | New York City |
| New York     | Newark        |
| New York     | Buffalo       |
| New York     | Rochester     |
| California   | San Francisco |
| California   | Sacramento    |
| California   | San Diego     |
| California   | Los Angeles   |
| Texas        | Tyler         |
| Texas        | Temple        |
| Texas        | Taylor        |
| Texas        | Dallas        |
| Pennsylvania | Philadelphia  |
| Pennsylvania | Pittsburgh    |
| Pennsylvania | Pottstown     |
+--------------+---------------+
Output:
+-------------+-------------------------------------------+-----------------------+
| state       | cities                                    | matching_letter_count |
+-------------+-------------------------------------------+-----------------------+
| Pennsylvania| Philadelphia, Pittsburgh, Pottstown       | 3                     |
| Texas       | Dallas, Taylor, Temple, Tyler             | 3                     |
| New York    | Buffalo, Newark, New York City, Rochester | 2                     |
+-------------+-------------------------------------------+-----------------------+
Explanation:
* **Pennsylvania** : 
* Has 3 cities (meets minimum requirement)
* All 3 cities start with 'P' (same as state)
* matching_letter_count = 3
* **Texas** : 
* Has 4 cities (meets minimum requirement)
* 3 cities (Taylor, Temple, Tyler) start with 'T' (same as state)
* matching_letter_count = 3
* **New York** : 
* Has 4 cities (meets minimum requirement)
* 2 cities (Newark, New York City) start with 'N' (same as state)
* matching_letter_count = 2
* **California** is not included in the output because: 
* Although it has 4 cities (meets minimum requirement)
* No cities start with 'C' (doesn't meet the matching letter requirement)
**Note:**
* Results are ordered by matching_letter_count in descending order
* When matching_letter_count is the same (Texas and New York both have 2), they are ordered by state name alphabetically
* Cities in each row are ordered alphabetically

Solution

Method 1 – Aggregation and Filtering with String Functions

Intuition

We need to group cities by state, filter for states with at least 3 cities, and only include those where at least one city starts with the same letter as the state. We also need to count such matching cities for ordering.

Approach

  • For MySQL:
    1. Use GROUP_CONCAT to aggregate city names for each state, sorted alphabetically.
    2. Use SUM with LEFT(city,1) = LEFT(state,1) to count matching-letter cities.
    3. Use HAVING to filter for states with at least 3 cities and at least one matching-letter city.
    4. Order by the count of matching-letter cities (descending), then by state (ascending).
  • For PostgreSQL:
    1. Use STRING_AGG to aggregate city names for each state, sorted alphabetically.
    2. Use SUM with LEFT(city,1) = LEFT(state,1) to count matching-letter cities.
    3. Use HAVING to filter for states with at least 3 cities and at least one matching-letter city.
    4. Order by the count of matching-letter cities (descending), then by state (ascending).
  • For Python (Pandas):
    1. Group by state and aggregate city names (sorted, comma-separated).
    2. Count cities and matching-letter cities for each state.
    3. Filter for states with at least 3 cities and at least one matching-letter city.
    4. Sort by count of matching-letter cities (descending), then by state (ascending).

Code

1
2
3
4
5
SELECT state, GROUP_CONCAT(city ORDER BY city) AS cities, SUM(LEFT(city,1) = LEFT(state,1)) AS match_count
FROM cities
GROUP BY state
HAVING COUNT(*) >= 3 AND match_count > 0
ORDER BY match_count DESC, state ASC;
1
2
3
4
5
SELECT state, STRING_AGG(city, ',' ORDER BY city) AS cities, SUM(CASE WHEN LEFT(city,1) = LEFT(state,1) THEN 1 ELSE 0 END) AS match_count
FROM cities
GROUP BY state
HAVING COUNT(*) >= 3 AND SUM(CASE WHEN LEFT(city,1) = LEFT(state,1) THEN 1 ELSE 0 END) > 0
ORDER BY match_count DESC, state ASC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Solution:
    def find_cities_in_each_state_ii(self, df):
        def match_count(row):
            return sum(city[0] == row.name[0] for city in row['city'])
        grouped = df.groupby('state')['city'].apply(list)
        df2 = grouped.to_frame()
        df2['count'] = df2['city'].apply(len)
        df2['match_count'] = df2.apply(lambda row: sum(c[0] == row.name[0] for c in row['city']), axis=1)
        df2['cities'] = df2['city'].apply(lambda x: ','.join(sorted(x)))
        ans = df2[(df2['count'] >= 3) & (df2['match_count'] > 0)].reset_index()
        ans = ans.sort_values(['match_count','state'], ascending=[False,True])
        return ans[['state','cities']].reset_index(drop=True)

Complexity

  • ⏰ Time complexity: O(n log n), due to sorting cities within each state.
  • 🧺 Space complexity: O(n), for storing the grouped and concatenated results.