Problem

Table: Fraud

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| policy_id   | int     |
| state       | varchar |
| fraud_score | int     |
+-------------+---------+
policy_id is column of unique values for this table.
This table contains policy id, state, and fraud score.

The Leetcode Insurance Corp has developed an ML-driven predictive model to detect the likelihood of fraudulent claims. Consequently, they allocate their most seasoned claim adjusters to address the top 5% of claims flagged by this model.

Write a solution to find the top 5 percentile of claims from each state.

Return the result table ordered bystate _inascending order, _fraud_score _indescending order, and _policy_id inascending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Input: 
Fraud table:
+-----------+------------+-------------+
| policy_id | state      | fraud_score | 
+-----------+------------+-------------+
| 1         | California | 0.92        | 
| 2         | California | 0.68        |   
| 3         | California | 0.17        | 
| 4         | New York   | 0.94        | 
| 5         | New York   | 0.81        | 
| 6         | New York   | 0.77        |  
| 7         | Texas      | 0.98        |  
| 8         | Texas      | 0.97        | 
| 9         | Texas      | 0.96        | 
| 10        | Florida    | 0.97        |  
| 11        | Florida    | 0.98        | 
| 12        | Florida    | 0.78        | 
| 13        | Florida    | 0.88        | 
| 14        | Florida    | 0.66        | 
+-----------+------------+-------------+
Output: 
+-----------+------------+-------------+
| policy_id | state      | fraud_score |
+-----------+------------+-------------+
| 1         | California | 0.92        | 
| 11        | Florida    | 0.98        | 
| 4         | New York   | 0.94        | 
| 7         | Texas      | 0.98        |  
+-----------+------------+-------------+
**Explanation**
- For the state of California, only policy ID 1, with a fraud score of 0.92, falls within the top 5 percentile for this state.
- For the state of Florida, only policy ID 11, with a fraud score of 0.98, falls within the top 5 percentile for this state. 
- For the state of New York, only policy ID 4, with a fraud score of 0.94, falls within the top 5 percentile for this state. 
- For the state of Texas, only policy ID 7, with a fraud score of 0.98, falls within the top 5 percentile for this state. 
Output table is ordered by state in ascending order, fraud score in descending order, and policy ID in ascending order.

Solution

Method 1 – Window Functions and Percentile Calculation

Intuition

Rank the fraud scores within each state, and select the top 5% (at least 1 per state). Use window functions to compute the cutoff.

Approach

  1. For each state, rank policies by fraud_score (desc), then policy_id (asc).
  2. Compute the number of rows per state, and select those with rank <= ceil(0.05 * count).

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY state ORDER BY fraud_score DESC, policy_id ASC) AS rnk,
         COUNT(*) OVER (PARTITION BY state) AS cnt
    FROM Fraud
)
SELECT policy_id, state, fraud_score
  FROM ranked
 WHERE rnk <= CEIL(0.05 * cnt)
 ORDER BY state ASC, fraud_score DESC, policy_id ASC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY state ORDER BY fraud_score DESC, policy_id ASC) AS rnk,
         COUNT(*) OVER (PARTITION BY state) AS cnt
    FROM Fraud
)
SELECT policy_id, state, fraud_score
  FROM ranked
 WHERE rnk <= CEIL(0.05 * cnt)
 ORDER BY state ASC, fraud_score DESC, policy_id ASC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY state ORDER BY fraud_score DESC, policy_id ASC) AS rnk,
         COUNT(*) OVER (PARTITION BY state) AS cnt
    FROM Fraud
)
SELECT policy_id, state, fraud_score
  FROM ranked
 WHERE rnk <= CEIL(0.05 * cnt)
 ORDER BY state ASC, fraud_score DESC, policy_id ASC;

Complexity

  • ⏰ Time complexity: O(n log n)
  • 🧺 Space complexity: O(n)