fix load_latest_forecast documentation/code around window size #93

nickreich · 2020-12-22T02:32:31Z

Discovered in #91 .

We have run into errors of loading duplicate forecasts in a few calls to load_latest_forecasts(). This appears to be due to a misspecification and/or a mis-interpretation of the forecast_date_window_size argument. For example, when running the below code, the expected behavior was that it would return one "latest forecast" submitted within between dates of
"2020-11-17" to "2020-11-23" (seven days inclusive), and
"2020-11-24" to "2020-11-30" (seven days inclusive).

This was based on reading the documentation which says "forecast_date_window_size: The number of days across which to look for recent forecasts. Defaults to 1, which means to only look at the last_forecast_date."

However, it seems that when you let forecast_date_window_size = 7 it actually looks back 8 days. IHME-CurveFit did not submit a model between 2020-11-24 and 2020-11-30 but did submit one on 2020-11-23. Note that the call to load_latest_forecasts() retrieves the same specified forecast row in both cycles through map_dfr() because it includes 2020-11-23 in the "window" when latest_forecast_date is 2020-11-30 and window is 7.

My suggestion is to simply update the function so that the default for forecast_date_window_size is zero and the documentation to reflect that this looks for forecasts on the latest_forecast_date only.

@elray1 is this a reasonable interpretation/fix?

> map_dfr(
+   c("2020-11-23", "2020-11-30"), 
+   function(the_weeks) {
+     load_latest_forecasts(
+       last_forecast_date = the_weeks,
+       models = "IHME-CurveFit",
+       forecast_date_window_size = 7,
+       locations = "US",
+       types = "point",
+       targets = "1 wk ahead inc death",
+       source = "zoltar")
+   }
+ )
...
# A tibble: 2 x 15
  model forecast_date location horizon temporal_resolu… target_variable
  <chr> <date>        <chr>    <chr>   <chr>            <chr>          
1 IHME… 2020-11-23    US       1       wk               inc death      
2 IHME… 2020-11-23    US       1       wk               inc death      
# … with 9 more variables: target_end_date <date>, type <chr>, quantile <dbl>,
#   value <dbl>, location_name <chr>, population <int>, geo_type <chr>,
#   geo_value <chr>, abbreviation <chr>

The text was updated successfully, but these errors were encountered:

elray1 · 2020-12-22T04:10:43Z

yep, sounds reasonable to me.

Serena-Wang self-assigned this Jan 4, 2021

Serena-Wang added a commit that referenced this issue Jan 4, 2021

update default value of forecast_date_window_size (close issue #93)

d77ccf1

Serena-Wang mentioned this issue Jan 4, 2021

update default value of forecast_date_window_size in load_latest_forecasts() #103

Merged

Serena-Wang closed this as completed Jan 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix load_latest_forecast documentation/code around window size #93

fix load_latest_forecast documentation/code around window size #93

nickreich commented Dec 22, 2020

elray1 commented Dec 22, 2020

fix load_latest_forecast documentation/code around window size #93

fix load_latest_forecast documentation/code around window size #93

Comments

nickreich commented Dec 22, 2020

elray1 commented Dec 22, 2020