Fit to multiple time series and fold in secondary model #313

sbfnk · 2022-09-22T11:45:13Z

It might be nice to add functionality to have multiple observation time series assuming that they are all reported from the same infection proces with a delay and scale. This would potentially lead to more informed reproduction number / incidence estimation, but also enable doing what is currently done in estimate_secondary.stan within the existing model in estimate_infections.stan (e.g. interpreting the ratio of obs_scale between observed time series of cases and deaths as CFR) without treating the primary observation time series as fixed.

The text was updated successfully, but these errors were encountered:

sbfnk · 2022-09-22T11:46:55Z

If #307 is eventually merged one could then perhaps also have covariates or a GP prior on e.g. the scaling factor.

sbfnk · 2023-02-17T09:19:57Z

Proposed interface for specifying multiple observation processes (building on #363):

Multiple delays can already be named with

delay = c(incubation_period = dist_spec(...), reporting_delay = dist_spec(...))

Observation processes could then similarly be concatenated

obs = obs_opts(c(hospitalisations = obs_spec(...), deaths = obs_spec(...)

where obs_spec includes some of the existing obs_opts, i.e. distribution family, week effect etc. In addition it would have something like delays = c("incubation_period", "reporting_delay") referring to the relevant specified delays by name. The scaling option should probably be incorporated into dist_spec so they can be appropriately chained. Ideally this would also include some of the existing observation model of estimate_secondary, e.g. incidence/cumulative. It could also include the option of not specifying a data set, in which case it wouldn't constrain the model but still generated (e.g. if one wanted a forecast of a secondary outcome from a known delay/scaling but where no observation exists, or explore scenarios).

seabbs · 2023-02-20T23:17:11Z

In addition it would have something like delays = c("incubation_period", "reporting_delay") referring to the relevant specified delays by name.

This part seems key. The alternative would be to make delay a list of lists but with each top level entry being named after the observation model. The upside of this is making it clear when the model is specified how the modules connect but the big obvious downside is some code would need to change. Essentially there would need to be a wrapper (or a list method) for stacking multiple distributions that have been combined with c and some new processing code to handle this.

I am happy with the proposal of identifying delays in the observation module. Some thought is needed about what this would mean for identifying when delays are constant and when they are not and how you could do this if keeping the current processing of delays (i.e a list with features that describe that list vs a list of lists that has these features).

It could also include the option of not specifying a data set, in which case it wouldn't constrain the model but still generated

This is a nice idea but I think should be pushed to its own issue/feature vs trying to implement in the first pass version?

sbfnk · 2023-02-21T09:16:45Z

This part seems key. The alternative would be to make delay a list of lists but with each top level entry being named after the observation model.

The issue I see with that is that we want uncertain delays to be constrained by all of the observations that are delayed by them, e.g. an incubation period would enter hospitalisations as well as deaths and should be constrained by both (and jointly enter the two), so it seems to me that some way of specifying delays separately and them referencing is needed.

sbfnk · 2023-02-21T09:16:58Z

This is a nice idea but I think should be pushed to its own issue/feature vs trying to implement in the first pass version?

I agree.

seabbs · 2023-02-21T12:32:09Z

The issue I see with that is that we want uncertain delays to be constrained by all of the observations that are delayed by them, e.g. an incubation period would enter hospitalisations as well as deaths and should be constrained by both (and jointly enter the two), so it seems to me that some way of specifying delays separately and them referencing is needed.

True but I suppose it depends on onerous repeating delays is vs having to define them in one place and link to them from another. As I suggested in the PR comment the other approach would be nest everything the observation model and so build a model made of observation modules each of which is self-contained. That woudl still lead to repetition of the delays but would have the benefit of I think being easy for the user to understand. Obviously that is a major rewrite/reformulation.

sbfnk · 2023-02-21T12:45:02Z

True but I suppose it depends on onerous repeating delays is vs having to define them in one place and link to them from another. As I suggested in the PR comment the other approach would be nest everything the observation model and so build a model made of observation modules each of which is self-contained. That woudl still lead to repetition of the delays but would have the benefit of I think being easy for the user to understand. Obviously that is a major rewrite/reformulation.

Is your suggestion that we have e.g.

incubation_period <- dist(mean = 3, sd = 2, mean_mean = 2, mean_sd = 1)
onset_to_admission <- dist(mean = 2, sd = 1)
onset_to_deah <- dist(mean = 5, sd = 3)

with observation modeuls defined as

obs = obs_opts(
    hospitalisations = obs_spec(delays = c(incubation_period, onset_to_admission), ...), 
    deaths = obs_spec(delays = c(incubation_period, onset_to_death), ...)
)

in which case, how would we ensure that the two observation modules are using the same incubation period?

seabbs · 2023-02-21T14:09:18Z

Yes that is what I had in mind (or something similar (potentially passing obs = c(hospitalisations = obs_opts(delays ...), deaths = obs_opts(delays..). where we can provide observation methods for c and print as you have done for the delays. I like this because then each observation specification is a complete model in some sense (aside from the generation of infections which we are assuming is shared across observations).

I think we just wouldn't because in the parallel case they might not want to. As we are treating them as independent anyway I am not sure this would be a big problem? Just a little inefficient.

This all gets more complicated when you think about where you want to apply scaling (i.e the CFR). Is it applied at infection (which is easiest) is it applied to cases post infection or elsewhere? All of those wash out when it is static but are different if the scaling varies over time.

sbfnk · 2023-02-21T14:11:46Z

I think we just wouldn't because in the parallel case they might not want to. As we are treating them as independent anyway I am not sure this would be a big problem? Just a little inefficient.

I think it would be a problem if we ended up e.g. with two different posterior distributions for incubation period parameters.

This all gets more complicated when you think about where you want to apply scaling (i.e the CFR). Is it applied at infection (which is easiest) is it applied to cases post infection or elsewhere?

In my mind scaling could be applied with each delay as well as at the time of observation.

seabbs · 2023-02-21T14:12:53Z

I think it would be a problem if we ended up e.g. with two different posterior distributions for incubation period parameters.

In the dist specification we could assign them unique ids (and let the user tag them manually if they want). This would let us track and combine in our preprocessing to prevent this.

seabbs · 2023-02-21T14:14:43Z

In my mind scaling could be applied with each delay as well as at the time of observation.

So this would preclude combining static delays and require a rewrite of how we do convolution if wanting time-varying scaling. If not wanting time-varying scaling then it doesn't matter where in the chain it goes (I believe).

sbfnk added the enhancement New feature or request label Sep 22, 2022

sbfnk mentioned this issue Nov 28, 2022

Split up package #338

Closed

sbfnk mentioned this issue Jan 12, 2023

clean up schedule epiforecasts/schedule#15

Merged

sbfnk mentioned this issue Feb 8, 2023

implement dist_spec interface #363

Merged

epiforecasts locked and limited conversation to collaborators Feb 23, 2023

sbfnk converted this issue into discussion #371 Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Fit to multiple time series and fold in secondary model #313

Fit to multiple time series and fold in secondary model #313

sbfnk commented Sep 22, 2022

sbfnk commented Sep 22, 2022 •

edited

Loading

sbfnk commented Feb 17, 2023 •

edited

Loading

seabbs commented Feb 20, 2023 •

edited

Loading

sbfnk commented Feb 21, 2023

sbfnk commented Feb 21, 2023

seabbs commented Feb 21, 2023

sbfnk commented Feb 21, 2023

seabbs commented Feb 21, 2023 •

edited

Loading

sbfnk commented Feb 21, 2023

seabbs commented Feb 21, 2023

seabbs commented Feb 21, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Fit to multiple time series and fold in secondary model #313

Fit to multiple time series and fold in secondary model #313

Comments

sbfnk commented Sep 22, 2022

sbfnk commented Sep 22, 2022 • edited Loading

sbfnk commented Feb 17, 2023 • edited Loading

seabbs commented Feb 20, 2023 • edited Loading

sbfnk commented Feb 21, 2023

sbfnk commented Feb 21, 2023

seabbs commented Feb 21, 2023

sbfnk commented Feb 21, 2023

seabbs commented Feb 21, 2023 • edited Loading

sbfnk commented Feb 21, 2023

seabbs commented Feb 21, 2023

seabbs commented Feb 21, 2023

This issue was moved to a discussion.

sbfnk commented Sep 22, 2022 •

edited

Loading

sbfnk commented Feb 17, 2023 •

edited

Loading

seabbs commented Feb 20, 2023 •

edited

Loading

seabbs commented Feb 21, 2023 •

edited

Loading