-
Notifications
You must be signed in to change notification settings - Fork 94
Submission format
Each forecast should be stored as a comma-separated value (csv) file in your data-processed/team-model
folder.
The csv file must use a standardised file name, and contain specific variable names and values which identify the forecast you are submitting. This allows us to evaluate and compare across forecasts. The automatic check validates both the filename and file contents to ensure the file can be used in the visualization and ensemble forecasting.
Each forecast file within the subdirectory should have the following name format:
YYYY-MM-DD-team-model.csv
The date YYYY-MM-DD
is the forecast date. This should be the last day of the submission period (Monday).
The team
and model
in this file name must match the name of the data-processed
directory this file is in.
The csv file must be contain only the following columns (in any order). No additional columns are allowed.
column | column type | description |
---|---|---|
forecast_date |
date | Date as YYYY-MM-DD, last day (Monday) of submission window |
Optional: scenario_id
|
string | One of "forecast" or a specified "scenario ID". If this column is not included it will be assumed that its value is "forecast" for all rows |
target |
string | "# wk ahead inc case", "# wk ahead inc death" or "# wk ahead inc hosp" where # is usually between 1 and 4 |
target_end_date |
date | Date as YYYY-MM-DD, the last day (Saturday) of the target week |
location |
string | An ISO-2 country code |
type |
string | One of "point" or "quantile" |
quantile |
numeric | For quantile forecasts, one of the 23 quantiles in c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99
|
value |
numeric | The predicted count, a non-negative integer number of new cases or deaths in the forecast week |
This should correspond with the date in the filename: see above.
This optional column identifies whether a model is predicting a forecast, or using a scenario. The value of scenario_id
should be a character (string) and one of:
- "forecast", indicating that the values are true forecasts, i.e. reflect probabilities of observing future values in the truth data
- a valid scenario ID
Initially, only forecasts will be accepted, but with the ECDC, we are developing scenarios, e.g. around vaccination and policies. See scenarios for details.
If this column is not included it will be assumed that its value is "forecast" for all rows in the file.
Values in the target
column must be a character (string) and be one of the following specific targets:
- "# wk ahead inc case"
- "# wk ahead inc hosp"
- "# wk ahead inc death"
"#" will usually be a number between 1 and 4.
For the week ahead horizon, we use Epidemiological Weeks (EW) defined by the US CDC. Each week starts on Sunday and ends on Saturday. See here for more detail on EW weeks, and the template file for csv files converting between dates and EW weeks.
All forecasts should be for the incident (weekly count) number of cases predicted by the model during the week that is N weeks after forecast_date
.
Predictions for this target will be evaluated compared to the number of new reported cases, as recorded by JHU.
Values in the target_end_date
column must be a date in the format YYYY-MM-DD
.
This is the date for the forecast target
and will be the Saturday at the end of the week time period. We provide a template csv to convert between an Epidemiological Week and its end date.
Values in the location
column must be one of the ISO 3166-1 alpha-2 (ISO-2) geocodes. We provide a geocode file to convert between country names and ISO-2 code (column "iso2c"), or if using R, you can use the countrycode package.
Values in the type
column are one of
- “point”
- “quantile”
This value indicates whether that row corresponds to a point forecast or a quantile forecast. Point forecasts are used in visualization, while quantile forecasts are used in visualisation and in ensemble construction, as long as all the quantiles given above are present. Both are considered in the evaluation, but with a focus on models that do provide quantiles.
Forecasts must include exactly 1 “point” forecast for each unique combination of location
and target
(usually 1 to 4 week ahead incident cases or deaths).
For quantile forecasts, this value indicates the quantile for the value
in this row, in the format "0.###"". Teams should provide the following 23 quantiles:
c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99)
i.e.
0.010 0.025 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500 0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990
Together with the single point forecast, this means that there should be 24 rows for every location-target pair.
If type
is “point”, the quantile
column value should be set to “NA”.
Values should be non-negative, integer counts.
- For a “point” prediction,
value
is simply the value of that point prediction for thetarget
andlocation
associated with that row. - For a “quantile” prediction,
value
is the inverse of the cumulative distribution function (CDF) for thetarget
,location
, andquantile
associated with that row.
- Preparing to submit
- Forecasting
- Submitting
- Other
- Forecast evaluation and ensemble building
- Editing the website
- Creation of an equally weighted ensemble forecast
- Merging weekly submissions