Problem

Table: Patients

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| patient_id   | int     |
| patient_name | varchar |
| conditions   | varchar |
+--------------+---------+

patient_id is the primary key (column with unique values) for this table. ‘conditions’ contains 0 or more code separated by spaces. This table contains information of the patients in the hospital.

Write a solution to find the patient_id, patient_name, and conditions of the patients who have Type I Diabetes. Type I Diabetes always starts with DIAB1 prefix.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

Input: Patients table:

+------------+--------------+--------------+
| patient_id | patient_name | conditions   |
+------------+--------------+--------------+
| 1          | Daniel       | YFEV COUGH   |
| 2          | Alice        |              |
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 |
| 5          | Alain        | DIAB201      |
+------------+--------------+--------------+

Output:

+------------+--------------+--------------+
| patient_id | patient_name | conditions   |
+------------+--------------+--------------+
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 | 
+------------+--------------+--------------+

Explanation: Bob and George both have a condition that starts with DIAB1.

Solution

Method 1 - Using Regexp

Code

SQL

The reason they are the same is that \b matches either a non-word character (in our case, a space) or the position before the first character in the string. Also, you need to escape a backslash with another backslash, like so: \\b. Otherwise, the regular expression won’t evaluate.

P.S. \b also matches the position after the last character, but it doesn’t matter in the context of this problem.

Using regexp_like
SELECT * FROM patients WHERE REGEXP_LIKE(conditions, '\\bDIAB1')
Using regexp
SELECT * FROM patients WHERE conditions REGEXP '\\bDIAB1'
Pandas
  • Use the str.contains() method to find patients with Type I Diabetes
  • Select only the required columns
import pandas as pd

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
    patients_with_diabetes = patients[patients['conditions'].str.contains(r'\bDIAB1')]
    result_df = patients_with_diabetes[['patient_id', 'patient_name', 'conditions']]
    return result_df

Method 2 - Using like Operator

Code

SQL
SELECT * FROM patients WHERE
conditions LIKE '% DIAB1%' OR
conditions LIKE 'DIAB1%';