Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code modification for us_cdc heat_related_illness #928

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Code modification
  • Loading branch information
saanikaaa committed Nov 22, 2023
commit 57fd2b30b77949dae9b610e5237cb73e6419b5b2
6 changes: 3 additions & 3 deletions scripts/us_cdc/heat_related_illness/README.md
Original file line number Diff line number Diff line change
@@ -7,23 +7,23 @@ The source data is downloaded manually from the EPH [website](https://ephtrackin
To clean the source data, run:

```bash
python clean_data.py --input_path=source_data/ --output_path=<output_path>
python3 clean_data.py --input_path=source_data/ --output_path=<output_path>
```

## Generating artifacts at a State level:
The artifacts can be generated from the cleaned data.
To generate `cleaned.csv`, `output.mcf` run:

```bash
python preprocess.py --input_path=<directory path to cleaned data> --config_path=<path to config> --output_path=<directory path to write csv and mcf>
python3 preprocess.py --input_path=<directory path to cleaned data> --config_path=<path to config> --output_path=<directory path to write csv and mcf>
```

## Aggregating at a Country level
At a country level, aggregation is performed by summing over the state level `cleaned.csv`.
To aggregate run:

```bash
python aggregate.py --input_path=<path to state level csv> --output_path=<output csv path>
python3 aggregate.py --input_path=<path to state level csv> --output_path=<output csv path>
```

## Data Caveats:
2 changes: 1 addition & 1 deletion scripts/us_cdc/heat_related_illness/preprocess_test.py
Original file line number Diff line number Diff line change
@@ -32,7 +32,7 @@ def test_csv(self):
input_path = os.path.join(_SCRIPT_PATH, 'testdata', 'cleaned_data')

subprocess.call([
'python', preprocess_path, f'--input_path={input_path}',
'python3', preprocess_path, f'--input_path={input_path}',
f'--config_path={config_path}', f'--output_path={tmp_dir}'
])

4 changes: 2 additions & 2 deletions scripts/us_cdc/heat_related_illness/source_data/deaths.html
Git LFS file not shown
4 changes: 2 additions & 2 deletions scripts/us_cdc/heat_related_illness/source_data/edVisits.html
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
40 changes: 20 additions & 20 deletions scripts/us_cdc/heat_related_illness/testdata/expected.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,24 @@
Year,Geo,StatVar,Quantity
2000-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,41.0
2001-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,25.0
2002-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,43.0
2003-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,38.0
2004-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,33.0
2005-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,62.0
2006-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,188.0
2007-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,67.0
2008-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,42.0
2009-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,54.0
2010-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,39.0
2011-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,26.0
2012-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,61.0
2013-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,66.0
2014-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,47.0
2015-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,49.0
2016-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,67.0
2017-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,98.0
2018-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,98.0
2019-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,57.0
2005-09,geoId/06,Count_MedicalConditionIncident_0To4Years_SummerSeason_Male_ConditionHeatStress_VisitedEmergencyDepartment,55.0
2005-09,geoId/06,Count_MedicalConditionIncident_0To4Years_SummerSeason_Female_ConditionHeatStress_VisitedEmergencyDepartment,42.0
2005-09,geoId/06,Count_MedicalConditionIncident_5To14Years_SummerSeason_Male_ConditionHeatStress_VisitedEmergencyDepartment,141.0
@@ -329,23 +349,3 @@ Year,Geo,StatVar,Quantity
2018-09,geoId/06,Count_MedicalConditionIncident_35To64Years_SummerSeason_Female_ConditionHeatStress_PatientHospitalized,114.0
2018-09,geoId/06,Count_MedicalConditionIncident_65OrMoreYears_SummerSeason_Male_ConditionHeatStress_PatientHospitalized,285.0
2018-09,geoId/06,Count_MedicalConditionIncident_65OrMoreYears_SummerSeason_Female_ConditionHeatStress_PatientHospitalized,199.0
2000-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,41.0
2001-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,25.0
2002-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,43.0
2003-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,38.0
2004-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,33.0
2005-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,62.0
2006-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,188.0
2007-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,67.0
2008-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,42.0
2009-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,54.0
2010-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,39.0
2011-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,26.0
2012-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,61.0
2013-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,66.0
2014-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,47.0
2015-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,49.0
2016-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,67.0
2017-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,98.0
2018-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,98.0
2019-09,geoId/06,Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased,57.0
18 changes: 9 additions & 9 deletions scripts/us_cdc/heat_related_illness/testdata/expected_output.mcf
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
Node: dcid:Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased
populationType: dcs:MedicalConditionIncident
medicalStatus: dcs:PatientDeceased
medicalCondition: dcs:HeatStress
climaticSeason: dcs:SummerSeason
measuredProperty: dcs:count
statType: dcs:measuredValue
typeOf: dcs:StatisticalVariable

Node: dcid:Count_MedicalConditionIncident_0To4Years_SummerSeason_Male_ConditionHeatStress_VisitedEmergencyDepartment
populationType: dcs:MedicalConditionIncident
medicalStatus: dcs:VisitedEmergencyDepartment
@@ -218,12 +227,3 @@ typeOf: dcs:StatisticalVariable
age: [65 - Years]
gender: dcs:Female

Node: dcid:Count_MedicalConditionIncident_SummerSeason_ConditionHeatStress_PatientDeceased
populationType: dcs:MedicalConditionIncident
medicalStatus: dcs:PatientDeceased
medicalCondition: dcs:HeatStress
climaticSeason: dcs:SummerSeason
measuredProperty: dcs:count
statType: dcs:measuredValue
typeOf: dcs:StatisticalVariable