Problem
Table: Patients
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| patient_id | int |
| patient_name | varchar |
| conditions | varchar |
+--------------+---------+
patient_id
is the primary key (column with unique values) for this table.
‘conditions’ contains 0 or more code separated by spaces.
This table contains information of the patients in the hospital.
Write a solution to find the patient_id, patient_name, and conditions of the patients who have Type I Diabetes. Type I Diabetes always starts with DIAB1
prefix.
Return the result table in any order.
The result format is in the following example.
Examples
Example 1:
Input: Patients table:
+------------+--------------+--------------+
| patient_id | patient_name | conditions |
+------------+--------------+--------------+
| 1 | Daniel | YFEV COUGH |
| 2 | Alice | |
| 3 | Bob | DIAB100 MYOP |
| 4 | George | ACNE DIAB100 |
| 5 | Alain | DIAB201 |
+------------+--------------+--------------+
Output:
+------------+--------------+--------------+
| patient_id | patient_name | conditions |
+------------+--------------+--------------+
| 3 | Bob | DIAB100 MYOP |
| 4 | George | ACNE DIAB100 |
+------------+--------------+--------------+
Explanation: Bob and George both have a condition that starts with DIAB1.
Solution
Method 1 - Using Regexp
Code
SQL
The reason they are the same is that \b
matches either a non-word character (in our case, a space) or the position before the first character in the string. Also, you need to escape a backslash with another backslash, like so: \\b
. Otherwise, the regular expression won’t evaluate.
P.S. \b
also matches the position after the last character, but it doesn’t matter in the context of this problem.
Using regexp_like
SELECT * FROM patients WHERE REGEXP_LIKE(conditions, '\\bDIAB1')
Using regexp
SELECT * FROM patients WHERE conditions REGEXP '\\bDIAB1'
Pandas
- Use the
str.contains()
method to find patients with Type I Diabetes - Select only the required columns
import pandas as pd
def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
patients_with_diabetes = patients[patients['conditions'].str.contains(r'\bDIAB1')]
result_df = patients_with_diabetes[['patient_id', 'patient_name', 'conditions']]
return result_df
Method 2 - Using like
Operator
Code
SQL
SELECT * FROM patients WHERE
conditions LIKE '% DIAB1%' OR
conditions LIKE 'DIAB1%';