Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fit to multiple time series and fold in secondary model #313

Closed
sbfnk opened this issue Sep 22, 2022 · 11 comments
Closed

Fit to multiple time series and fold in secondary model #313

sbfnk opened this issue Sep 22, 2022 · 11 comments
Labels
enhancement New feature or request

Comments

@sbfnk
Copy link
Contributor

sbfnk commented Sep 22, 2022

It might be nice to add functionality to have multiple observation time series assuming that they are all reported from the same infection proces with a delay and scale. This would potentially lead to more informed reproduction number / incidence estimation, but also enable doing what is currently done in estimate_secondary.stan within the existing model in estimate_infections.stan (e.g. interpreting the ratio of obs_scale between observed time series of cases and deaths as CFR) without treating the primary observation time series as fixed.

@sbfnk sbfnk added the enhancement New feature or request label Sep 22, 2022
@sbfnk
Copy link
Contributor Author

sbfnk commented Sep 22, 2022

If #307 is eventually merged one could then perhaps also have covariates or a GP prior on e.g. the scaling factor.

@sbfnk
Copy link
Contributor Author

sbfnk commented Feb 17, 2023

Proposed interface for specifying multiple observation processes (building on #363):

Multiple delays can already be named with

delay = c(incubation_period = dist_spec(...), reporting_delay = dist_spec(...))

Observation processes could then similarly be concatenated

obs = obs_opts(c(hospitalisations = obs_spec(...), deaths = obs_spec(...)

where obs_spec includes some of the existing obs_opts, i.e. distribution family, week effect etc. In addition it would have something like delays = c("incubation_period", "reporting_delay") referring to the relevant specified delays by name. The scaling option should probably be incorporated into dist_spec so they can be appropriately chained. Ideally this would also include some of the existing observation model of estimate_secondary, e.g. incidence/cumulative. It could also include the option of not specifying a data set, in which case it wouldn't constrain the model but still generated (e.g. if one wanted a forecast of a secondary outcome from a known delay/scaling but where no observation exists, or explore scenarios).

@seabbs
Copy link
Contributor

seabbs commented Feb 20, 2023

In addition it would have something like delays = c("incubation_period", "reporting_delay") referring to the relevant specified delays by name.

This part seems key. The alternative would be to make delay a list of lists but with each top level entry being named after the observation model. The upside of this is making it clear when the model is specified how the modules connect but the big obvious downside is some code would need to change. Essentially there would need to be a wrapper (or a list method) for stacking multiple distributions that have been combined with c and some new processing code to handle this.

I am happy with the proposal of identifying delays in the observation module. Some thought is needed about what this would mean for identifying when delays are constant and when they are not and how you could do this if keeping the current processing of delays (i.e a list with features that describe that list vs a list of lists that has these features).

It could also include the option of not specifying a data set, in which case it wouldn't constrain the model but still generated

This is a nice idea but I think should be pushed to its own issue/feature vs trying to implement in the first pass version?

@sbfnk
Copy link
Contributor Author

sbfnk commented Feb 21, 2023

This part seems key. The alternative would be to make delay a list of lists but with each top level entry being named after the observation model.

The issue I see with that is that we want uncertain delays to be constrained by all of the observations that are delayed by them, e.g. an incubation period would enter hospitalisations as well as deaths and should be constrained by both (and jointly enter the two), so it seems to me that some way of specifying delays separately and them referencing is needed.

@sbfnk
Copy link
Contributor Author

sbfnk commented Feb 21, 2023

This is a nice idea but I think should be pushed to its own issue/feature vs trying to implement in the first pass version?

I agree.

@seabbs
Copy link
Contributor

seabbs commented Feb 21, 2023

The issue I see with that is that we want uncertain delays to be constrained by all of the observations that are delayed by them, e.g. an incubation period would enter hospitalisations as well as deaths and should be constrained by both (and jointly enter the two), so it seems to me that some way of specifying delays separately and them referencing is needed.

True but I suppose it depends on onerous repeating delays is vs having to define them in one place and link to them from another. As I suggested in the PR comment the other approach would be nest everything the observation model and so build a model made of observation modules each of which is self-contained. That woudl still lead to repetition of the delays but would have the benefit of I think being easy for the user to understand. Obviously that is a major rewrite/reformulation.

@sbfnk
Copy link
Contributor Author

sbfnk commented Feb 21, 2023

True but I suppose it depends on onerous repeating delays is vs having to define them in one place and link to them from another. As I suggested in the PR comment the other approach would be nest everything the observation model and so build a model made of observation modules each of which is self-contained. That woudl still lead to repetition of the delays but would have the benefit of I think being easy for the user to understand. Obviously that is a major rewrite/reformulation.

Is your suggestion that we have e.g.

incubation_period <- dist(mean = 3, sd = 2, mean_mean = 2, mean_sd = 1)
onset_to_admission <- dist(mean = 2, sd = 1)
onset_to_deah <- dist(mean = 5, sd = 3)

with observation modeuls defined as

obs = obs_opts(
    hospitalisations = obs_spec(delays = c(incubation_period, onset_to_admission), ...), 
    deaths = obs_spec(delays = c(incubation_period, onset_to_death), ...)
)

in which case, how would we ensure that the two observation modules are using the same incubation period?

@seabbs
Copy link
Contributor

seabbs commented Feb 21, 2023

Yes that is what I had in mind (or something similar (potentially passing obs = c(hospitalisations = obs_opts(delays ...), deaths = obs_opts(delays..). where we can provide observation methods for c and print as you have done for the delays. I like this because then each observation specification is a complete model in some sense (aside from the generation of infections which we are assuming is shared across observations).

I think we just wouldn't because in the parallel case they might not want to. As we are treating them as independent anyway I am not sure this would be a big problem? Just a little inefficient.

This all gets more complicated when you think about where you want to apply scaling (i.e the CFR). Is it applied at infection (which is easiest) is it applied to cases post infection or elsewhere? All of those wash out when it is static but are different if the scaling varies over time.

@sbfnk
Copy link
Contributor Author

sbfnk commented Feb 21, 2023

I think we just wouldn't because in the parallel case they might not want to. As we are treating them as independent anyway I am not sure this would be a big problem? Just a little inefficient.

I think it would be a problem if we ended up e.g. with two different posterior distributions for incubation period parameters.

This all gets more complicated when you think about where you want to apply scaling (i.e the CFR). Is it applied at infection (which is easiest) is it applied to cases post infection or elsewhere?

In my mind scaling could be applied with each delay as well as at the time of observation.

@seabbs
Copy link
Contributor

seabbs commented Feb 21, 2023

I think it would be a problem if we ended up e.g. with two different posterior distributions for incubation period parameters.

In the dist specification we could assign them unique ids (and let the user tag them manually if they want). This would let us track and combine in our preprocessing to prevent this.

@seabbs
Copy link
Contributor

seabbs commented Feb 21, 2023

In my mind scaling could be applied with each delay as well as at the time of observation.

So this would preclude combining static delays and require a rewrite of how we do convolution if wanting time-varying scaling. If not wanting time-varying scaling then it doesn't matter where in the chain it goes (I believe).

@epiforecasts epiforecasts locked and limited conversation to collaborators Feb 23, 2023
@sbfnk sbfnk converted this issue into discussion #371 Feb 23, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants