Problem

Table: logs

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| log_id      | int     |
| ip          | varchar |
| status_code | int     |
+-------------+---------+
log_id is the unique key for this table.
Each row contains server access log information including IP address and HTTP status code.

Write a solution to find invalid IP addresses. An IPv4 address is invalid if it meets any of these conditions:

  • Contains numbers greater than 255 in any octet
  • Has leading zeros in any octet (like 01.02.03.04)
  • Has less or more than 4 octets

Return the result table ordered by invalid_count, ip indescending order respectively.

The result format is in the following example.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Input:
logs table:
+--------+---------------+-------------+
| log_id | ip            | status_code | 
+--------+---------------+-------------+
| 1      | 192.168.1.1   | 200         | 
| 2      | 256.1.2.3     | 404         | 
| 3      | 192.168.001.1 | 200         | 
| 4      | 192.168.1.1   | 200         | 
| 5      | 192.168.1     | 500         | 
| 6      | 256.1.2.3     | 404         | 
| 7      | 192.168.001.1 | 200         | 
+--------+---------------+-------------+
Output:
+---------------+--------------+
| ip            | invalid_count|
+---------------+--------------+
| 256.1.2.3     | 2            |
| 192.168.001.1 | 2            |
| 192.168.1     | 1            |
+---------------+--------------+
Explanation:
* 256.1.2.3 is invalid because 256 > 255
* 192.168.001.1 is invalid because of leading zeros
* 192.168.1 is invalid because it has only 3 octets
The output table is ordered by invalid_count, ip in descending order
respectively.

Example 2:

Solution

Method 1 – Pattern Matching and Aggregation

Intuition

To find invalid IPv4 addresses, we need to check for three conditions: any octet > 255, any octet with leading zeros, or not exactly 4 octets. We can use SQL string and pattern functions or pandas string methods to check these conditions for each IP.

Approach

  1. Split the IP into octets using string functions.
  2. Check:
    • If the number of octets is not 4.
    • If any octet has leading zeros (length > 1 and starts with ‘0’).
    • If any octet is not a number or is > 255.
  3. Count the number of invalid occurrences for each IP.
  4. Return the IP and its invalid count, ordered by invalid_count and ip descending.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
SELECT ip, COUNT(*) AS invalid_count
FROM logs
WHERE
  -- Not 4 octets
  LENGTH(ip) - LENGTH(REPLACE(ip, '.', '')) != 3
  OR
  -- Any octet has leading zeros or > 255
  (
    SUBSTRING_INDEX(ip, '.', 1) REGEXP '^0[0-9]+' OR
    SUBSTRING_INDEX(SUBSTRING_INDEX(ip, '.', 2), '.', -1) REGEXP '^0[0-9]+' OR
    SUBSTRING_INDEX(SUBSTRING_INDEX(ip, '.', 3), '.', -1) REGEXP '^0[0-9]+' OR
    SUBSTRING_INDEX(ip, '.', -1) REGEXP '^0[0-9]+'
    OR
    CAST(SUBSTRING_INDEX(ip, '.', 1) AS UNSIGNED) > 255
    OR CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(ip, '.', 2), '.', -1) AS UNSIGNED) > 255
    OR CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(ip, '.', 3), '.', -1) AS UNSIGNED) > 255
    OR CAST(SUBSTRING_INDEX(ip, '.', -1) AS UNSIGNED) > 255
  )
GROUP BY ip
ORDER BY invalid_count DESC, ip DESC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
SELECT ip, COUNT(*) AS invalid_count
FROM logs
WHERE
  LENGTH(ip) - LENGTH(REPLACE(ip, '.', '')) != 3
  OR
  (
    split_part(ip, '.', 1) ~ '^0[0-9]+'
    OR split_part(ip, '.', 2) ~ '^0[0-9]+'
    OR split_part(ip, '.', 3) ~ '^0[0-9]+'
    OR split_part(ip, '.', 4) ~ '^0[0-9]+'
    OR split_part(ip, '.', 1)::int > 255
    OR split_part(ip, '.', 2)::int > 255
    OR split_part(ip, '.', 3)::int > 255
    OR split_part(ip, '.', 4)::int > 255
  )
GROUP BY ip
ORDER BY invalid_count DESC, ip DESC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import pandas as pd
import re

def find_invalid_ips(logs: pd.DataFrame) -> pd.DataFrame:
    def is_invalid(ip: str) -> bool:
        parts = ip.split('.')
        if len(parts) != 4:
            return True
        for p in parts:
            if not p.isdigit():
                return True
            if len(p) > 1 and p[0] == '0':
                return True
            if int(p) > 255:
                return True
        return False
    mask = logs['ip'].apply(is_invalid)
    res = logs[mask].groupby('ip').size().reset_index(name='invalid_count')
    res = res.sort_values(['invalid_count', 'ip'], ascending=[False, False])
    return res

Complexity

  • ⏰ Time complexity: O(n), where n is the number of log rows. Each IP is checked once.
  • 🧺 Space complexity: O(n), for storing the result and intermediate groupings