Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify fit_seromodel() output #89

Closed
Bisaloo opened this issue Jul 6, 2023 · 4 comments
Closed

Simplify fit_seromodel() output #89

Bisaloo opened this issue Jul 6, 2023 · 4 comments
Assignees

Comments

@Bisaloo
Copy link
Member

Bisaloo commented Jul 6, 2023

Currently, the output of fit_seromodel() is quite large and contains some information that could easily be removed.

This would allow us to:

  • Use simpler objects than lists for output. Possibly even an existing S3 class with useful methods. This should help users find the information that they need in the output and manipulate the output more easily.
  • Use less storage space when saving the output of this function (discussed briefly in Fix very large git repository #77 (comment))
@Bisaloo
Copy link
Member Author

Bisaloo commented Jul 6, 2023

We have discussed this today and this is where we landed. Below is the current output of fit_seromodel() and how we want to deal with each element:

  • n_iters, n_thin, foi_model, delta, m_treed are function input and should not be included in the output. We should instead teach users to always keep the script producing the object alongside the object itself.
  • n_warmup, exposure_years, exposure_ages, stan_data can be directly computer from inputs with a single line of code each. If we believe users might want to access this data (e.g., for debugging), we should instead create an intermediate exported function to compute them but it shouldn't be part of fit_seromodel() output by default
  • loo_fit, foi_cent_est, foi_post_s can be directly computed from fit with a single line of code each. Users can either compute them by themselves or they can be included in downstream post-processing functions that act on the output of fit_seromodel()

This leaves us with fit, which is the output of rstan::sampling() and is an object of class stanfit. The seems like a good candidate for output because it means we benefit from stanfit methods out of the box. If we want to provide custom methods, it is also possible to create a subclass inheriting from stanfit.

@ben18785
Copy link
Collaborator

I agree -- returning a stanfit object makes most sense since Stan is a well-used and well-developed package that many people have experience working with.

@jpavlich
Copy link
Member

I just merged @ntorresd 's PR #109 in which the object returned by fit_seromodel() is simpler. @ntorresd 's rationale of not returning a plain stanfit object is to reduce the probability that the user provides the wrong serodata to other functions that require both serodata and their corresponding stanfit object (a serodata that was not the input to the generated stanfit object)

@Bisaloo @ben18785 do you agree with the above argument? or would you delegate into the package's user the responsibility to ensure the correct parameters are passed to other functions that use both serodata and the stanfit object.

I do no heavily lean towards any option, so I would like your feedback.

@Bisaloo
Copy link
Member Author

Bisaloo commented Aug 30, 2023

would you delegate into the package's user the responsibility to ensure the correct parameters are passed to other functions that use both serodata and the stanfit object.

I think it's the user responsibility. It's important to teach users to use reproducible scripts, and not carry their data from one session to the other, losing the object provenance.
Code workarounds to try and handle this to the user add complexity to the code, thus reducing maintainability, and paradoxically makes the outputs more difficult to manipulate. And even with this, you can't completely protect the user from losing the object provenance. Hence why I believe that teaching good practices is the only way to solve this issue.

If we had a plain stanfit object, users would be able to directly use already implemented methods (plot(), summary()). Having a more complex object as in #109 will require the users to do plot(x$stanfit) or force you to implement your own methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants