Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Duplicate Records in Sepsis Inpatient Query #35

Open
vaishvikr opened this issue Aug 26, 2024 · 1 comment
Open

Potential Duplicate Records in Sepsis Inpatient Query #35

vaishvikr opened this issue Aug 26, 2024 · 1 comment

Comments

@vaishvikr
Copy link

vaishvikr commented Aug 26, 2024

SQL File : 01A_SEPSIS_Cohort_V4.sql

-- Sepsis patients with inpatient encounters
SELECT
    v.person_id,
    v.visit_occurrence_id,
    v.visit_start_date,
    v.visit_end_date
INTO #inpat
FROM visit_occurrence AS v
INNER JOIN #sepsis_diagnosis AS sd ON v.person_id = sd.person_id
WHERE v.visit_concept_id IN (9201, 262) -- Inpatient visit/ED and inpatient visit
AND v.visit_start_date >= '2020-01-01'
AND (
    DATEDIFF(DAY, sd.diagnosis_date, v.visit_start_date) > -7
    AND DATEDIFF(DAY, sd.diagnosis_date, v.visit_start_date) < 21
);


When executing the query to identify sepsis patients with inpatient encounters, duplicate records may be generated if a patient has more than one sepsis diagnosis entry in the #sepsis_diagnosis temporary table. This happens because the query joins on v.person_id = sd.person_id, resulting in multiple rows for the same visit if there are multiple diagnosis dates for the same patient within the specified date range.

Example:

Suppose you have the following data in the #sepsis_diagnosis temporary table:

person_id diagnosis_date
101 2020-02-01
101 2020-02-05

And the following data in the visit_occurrence table:

person_id visit_occurrence_id visit_start_date visit_end_date visit_concept_id
101 5001 2020-02-10 2020-02-15 9201

In this example, patient 101 has two diagnosis dates (February 1 and February 5). The visit for this patient starts on February 10 and is an inpatient visit (visit_concept_id = 9201).

Problem:
The query would result in two rows being generated for this visit because both diagnosis dates fall within the range of 7 days before and 21 days after the visit start date:

person_id visit_occurrence_id visit_start_date visit_end_date
101 5001 2020-02-10 2020-02-15
101 5001 2020-02-10 2020-02-15

These two rows are duplicates because they correspond to the same visit but are repeated due to the two different diagnosis dates.

Solution: Use Distinct or Rank diagnosis

@DBJHU
Copy link
Collaborator

DBJHU commented Aug 27, 2024

Thank you for sharing this. I will ask Smitty how he wants to handle it. I agree Distinct or Rank would be effective!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants