Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix load_latest_forecast documentation/code around window size #93

Closed
nickreich opened this issue Dec 22, 2020 · 1 comment
Closed

fix load_latest_forecast documentation/code around window size #93

nickreich opened this issue Dec 22, 2020 · 1 comment
Assignees

Comments

@nickreich
Copy link
Member

Discovered in #91 .

We have run into errors of loading duplicate forecasts in a few calls to load_latest_forecasts(). This appears to be due to a misspecification and/or a mis-interpretation of the forecast_date_window_size argument. For example, when running the below code, the expected behavior was that it would return one "latest forecast" submitted within between dates of
"2020-11-17" to "2020-11-23" (seven days inclusive), and
"2020-11-24" to "2020-11-30" (seven days inclusive).

This was based on reading the documentation which says "forecast_date_window_size: The number of days across which to look for recent forecasts. Defaults to 1, which means to only look at the last_forecast_date."

However, it seems that when you let forecast_date_window_size = 7 it actually looks back 8 days. IHME-CurveFit did not submit a model between 2020-11-24 and 2020-11-30 but did submit one on 2020-11-23. Note that the call to load_latest_forecasts() retrieves the same specified forecast row in both cycles through map_dfr() because it includes 2020-11-23 in the "window" when latest_forecast_date is 2020-11-30 and window is 7.

My suggestion is to simply update the function so that the default for forecast_date_window_size is zero and the documentation to reflect that this looks for forecasts on the latest_forecast_date only.

@elray1 is this a reasonable interpretation/fix?

> map_dfr(
+   c("2020-11-23", "2020-11-30"), 
+   function(the_weeks) {
+     load_latest_forecasts(
+       last_forecast_date = the_weeks,
+       models = "IHME-CurveFit",
+       forecast_date_window_size = 7,
+       locations = "US",
+       types = "point",
+       targets = "1 wk ahead inc death",
+       source = "zoltar")
+   }
+ )
...
# A tibble: 2 x 15
  model forecast_date location horizon temporal_resolu… target_variable
  <chr> <date>        <chr>    <chr>   <chr>            <chr>          
1 IHME… 2020-11-23    US       1       wk               inc death      
2 IHME… 2020-11-23    US       1       wk               inc death      
# … with 9 more variables: target_end_date <date>, type <chr>, quantile <dbl>,
#   value <dbl>, location_name <chr>, population <int>, geo_type <chr>,
#   geo_value <chr>, abbreviation <chr>
@elray1
Copy link
Collaborator

elray1 commented Dec 22, 2020

yep, sounds reasonable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants