Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forecast_opts() or add_horizon() #867

Closed
sbfnk opened this issue Nov 25, 2024 · 12 comments · Fixed by #901
Closed

forecast_opts() or add_horizon() #867

sbfnk opened this issue Nov 25, 2024 · 12 comments · Fixed by #901

Comments

@sbfnk
Copy link
Contributor

sbfnk commented Nov 25, 2024

Just surfacing a discussion mixed up with other discussions in #346 (comment)

In order to address #640 we need to decide how to specify accumulation for future data. Options on the table are (for an example of weekly forecasts):

obs |>
  estimate_infections(forecast = forecast_opts(horizon = 14, accumulate = 7)

or

obs |>
  add_horizon(horizon = 14, accumulate = 7) |>
  estimate_infections()

My opinion is option 2 is more elegant ("future dates are just another form of missing data and the model doesn't actually need to know when the present is") but option 1 has the advantage more in line with what we've been doing in the package so far.

@seabbs
Copy link
Contributor

seabbs commented Nov 25, 2024

Option 2. is how epinowcast works and I agree is more elegant. I'm in favour of it I think. I might also be in favour of some attempt to detect the accumulation in the current data to avoid having to respecify it but that might be dangerous.

Here the only thing we might want to know when a forecast is is for the allocation of things to estimate, estimate from partial data, and forecast which I think currently uses the supplied data vs being based on the presence of data.

Something to note that just occurred to me is have we made a breaking change in the accumulation PR in that before the predicted cases were always on the daily scale and now they are always on the accumulated scale? That is perhaps no bad thing but maybe we want to supply some way to get back to daily scale forecasts at a later date.

@sbfnk
Copy link
Contributor Author

sbfnk commented Nov 25, 2024

Here the only thing we might want to know when a forecast is is for the allocation of things to estimate, estimate from partial data, and forecast which I think currently uses the supplied data vs being based on the presence of data.

Hmm, yes this is true and would be an argument against add_horizon() (as then we also have to specify somewhere the date of making the forecast).

Something to note that just occurred to me is have we made a breaking change in the accumulation PR in that before the predicted cases were always on the daily scale and now they are always on the accumulated scale?

I don't think so - what's reported as predicted cases is imputed_reports which uses the untruncated, unaccumulated reports.

@sbfnk
Copy link
Contributor Author

sbfnk commented Nov 25, 2024

I might also be in favour of some attempt to detect the accumulation in the current data to avoid having to respecify it

We could do that if all the gaps are equal. Same with the initial_accumulate argument.

@sbfnk sbfnk changed the title forecast_opts() or `add_horizon() forecast_opts() or add_horizon() Nov 26, 2024
@seabbs
Copy link
Contributor

seabbs commented Nov 26, 2024

I don't think so - what's reported as predicted cases is

So we would need to change this for forecasting or distinguish accumulation for forecasting vs for inference in order to get accumulation to show up in outputs?

@sbfnk
Copy link
Contributor Author

sbfnk commented Nov 26, 2024

So we would need to change this for forecasting

We could change it but now that we're accumulating after truncating it means we'd then also show truncation (potentially a good thing for comparing to data, potentially confusing if it's interpreted as final cases). I think this would make most sense.

@seabbs
Copy link
Contributor

seabbs commented Nov 26, 2024

my view is that I think we want to consider accumulation of the forecast as a standalone feature from accumulation of the data in the likelihood or have a clear communication of what is a posterior prediction of the data and what is a prediction of some latent quantity (i.e the unaccumulated data). I think we probably can't make the breaking change where accumulation and truncation shows up in the current forecast output as a lot of people have grown used to interpreting that as a forecast of a latent (potentially observed in the future) thing

@sbfnk
Copy link
Contributor Author

sbfnk commented Nov 26, 2024

This makes sense. Perhaps the plot with accumulation and truncation could be done in a separate plot_fit() function or the like.

It does come back to the issue raised here though and how we can improve the plot which currently could be wrongly interpreted to mean a poor fit to / underestimate of the data.
#640

@seabbs
Copy link
Contributor

seabbs commented Nov 27, 2024

we can improve the plot which currently could be wrongly interpreted to mean a poor fit to / underestimate of the data.

One no code option is to clearly document exactly what is being plot here.

I like the idea of introducing new more clearly scoped plotting tools as the solution and maybe gradually depreciating what is in

@jamesmbaazam
Copy link
Contributor

my view is that I think we want to consider accumulation of the forecast as a standalone feature from accumulation of the data in the likelihood or have a clear communication of what is a posterior prediction of the data and what is a prediction of some latent quantity (i.e the unaccumulated data). I think we probably can't make the breaking change where accumulation and truncation shows up in the current forecast output as a lot of people have grown used to interpreting that as a forecast of a latent (potentially observed in the future) thing

Would this also allow for the users to be able to accumulate the forecasts differently from the way the data was collected? For example, my data is reported in weekly aggregates but I want my forecasts to be on a daily scale.

@seabbs
Copy link
Contributor

seabbs commented Dec 11, 2024

yes exactly

@sbfnk sbfnk added this to the CRAN v1.7 release milestone Dec 12, 2024
@seabbs
Copy link
Contributor

seabbs commented Dec 13, 2024

@nlinton as you have looked at this feature I thought you might have a useful user opinion. As you can see we are going a bit back and forth on this

@nlinton
Copy link

nlinton commented Dec 16, 2024

@seabbs Being able to accumulate the forecasts of reported cases differently from the way the data was collected seems like a useful feature. Either add_horizon() or forecast_opts() is fine as long as it is clearly documented what it is doing and how I should use it, but I do think it is useful to keep track of estimate, estimate from partial data, and forecast for reported cases.

This was referenced Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants