R package containing functions to validate and visualize Scenario Modeling Hub
submissions, and to pull data from the R package
covidcast
(output in the Scenario Modeling Hub (SMH) standard format).
For more information on the Scenario Modeling Hub and/or on how to participate,
please consult the Scenario Modeling Hub
GitHub repository
or website
The package is currently only available on GitHub, to install it please follow the next steps:
install.packages("remotes")
remotes::install_github("midas-network/SMHvalidation")
or it can be manually installed by directly cloning/forking/downloading the package from GitHub.
library(SMHvalidation)
The main validation function, validate_submission()
, can be used to
validate Scenario Modeling Hub submissions. The function runs multiple
checks, please look at the vignette ("Validation Checks")
containing the list of all the tests with detailed information.
The function internally runs all the different validation checks
functions (test_column()
, test_scenario()
, test_origindate()
,
test_quantiles()
, test_val()
, test_target()
, test_location()
, etc.) on
SMH submissions and prints information about the results of each tests
on the submission: warning(s), error(s) or if all the tests were
successful.
For more information, each function has a documentation associated
accessible with ?validate_submission()
for example.
To test a submission file, it's necessary to provide 2 arguments:
- the path of the file to test (CSV, ZIP or GZ file format)
- path to JSON file containing round definitions: names of columns,
target names, ... following the
tasks.json
Hubverse format
js_def <- "PATH/TO/ROUND/tasks.json"
validate_submission("PATH/TO/SUBMISSION", js_def)
As a warning:
-
The vast majority of submissions done in 2021 will return an error on the model_projection_date column as the rules and tests on this column have been made and put in place only since 2022 (COVID-19 - round 12).
-
Also, the cumulative observed data for the location 39 has been corrected after COVID-19 round 12 publication so all the submissions might have an error saying:
"Error: Some value(s) are inferior than the last observed cumulative death count. Please check location(s): 39 "
These 2 errors can be ignored.
For more information, please feel free to look at the documentation of
the function (?pull_gs_data
)
Currently, the function only pull COVID-19 data.
The SMHvalidation
package contains the function
pull_gs_data()
that download and compile data to output weekly data on
cumulative and incidence number of cases and deaths, and hospitalization
incidence:
-
For the cumulative numbers: use the
confirmed_cumulative_num
anddeaths_cumulative_num
signals fromjhu-csse
. As the data are expressed per day and the Scenario Modeling Hub use cumulative data by epiweek, the last day of each epiweek is selected for each location and the values are added between all states to obtains the US level data. -
For the incidence numbers: use the
confirmed_incidence_num
anddeaths_incidence_num
signals fromjhu-csse
andconfirmed_admissions_covid_1d
fromhhs
. As the data are expressed per day and the Scenario Modeling Hub use cumulative data by epiweek, the sum of the incidences is calculated by epiweek and by location, and the values are added between all states to obtains the US level data.
For the cases and deaths data, the data are coming from the JHU CSSE group and for the hospitalization incidence, the data are extracted from the HealthData.gov COVID-19 Reported Patient Impact and Hospital Capacity by State Daily and Timeseries datasets.
To access all these data we use the covidcast
R package. A package
using the COVIDcast Epidata API maintained by the Delphi Research Group
at Carnegie Mellon University. Please see
here,
for more information.
# Store the output in an object
lst_gs <- pull_gs_data()
# Look at the sctructure of the object
str(lst_gs)
# Print the output
lst_gs
The overall project is available under an open-source MIT license.
Please feel free to open an issue if you identify any issue with the package or would like to suggest an idea/improvement. We also welcome any pull-request.
Scenario modeling groups are supported through grants to the contributing investigators.
The Scenario Modeling Hub site is supported by the MIDAS Coordination Center, NIGMS Grant U24GM132013 to the University of Pittsburgh.