Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 write get baseline predictions function #4

Merged
merged 36 commits into from
Sep 10, 2024

Conversation

lshandross
Copy link
Collaborator

No description provided.

@lshandross lshandross linked an issue Sep 4, 2024 that may be closed by this pull request
Copy link
Contributor

@elray1 elray1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! I added a few comments -- mostly minor, one question of substance about how to handle the two different output types.

R/get_baseline_predictions.R Outdated Show resolved Hide resolved
R/get_baseline_predictions.R Outdated Show resolved Hide resolved
R/get_baseline_predictions.R Outdated Show resolved Hide resolved
Comment on lines +24 to +25
#' @param seed integer specifying a seed to set for reproducible results.
#' Defaults to NULL, in which case no seed is set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote to take this argument out, and if the user wants reproducible results they can just set the seed before calling the function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the results actually weren't reproducible unless I set the seed within the function, which is why I added the argument

Comment on lines +46 to +49
num_locs <- length(unique(target_ts[["location"]]))
if (num_locs != 1) {
cli::cli_abort("{.arg target_ts} contains {.val num_locs} but only one may be provided.")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is the only place in this function where the location column is used. It seems like this limits the flexibility of this function -- e.g., the input data must encode the location in the column named location and the validation will not check for other types of disaggregation, e.g. by age group.

As a proposal, what if we get rid of the requirement of a column named "location" and this check that there is only a single location, and replace it with a check that there are no duplicated values in the time_index column. This is not exactly logically equivalent, but in most cases a data set with more than one location will also contain duplicates for the time index.

Another idea heading in the other direction would be to check that there is only one value in any columns not named time_index or observation. That would mean we could still do this check, but it would apply generally and would not rely on this specific column name.

I'm open to other ideas too, or just keeping this as is for now and filing an issue to address this later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the code like this based on the hubverse framework of target data in a time series format always having the three columns "time_index", "location", and "observation", so the validation function for target time series data also makes this assumption. I don't know if this framework has changed for the hubverse, or if we'd prefer to do something different for this package, but I think we should discuss and then decide on a package-level, rather than just for this function in particular.

R/validate_model_inputs.R Outdated Show resolved Hide resolved
R/validate_model_inputs.R Outdated Show resolved Hide resolved
R/validate_model_inputs.R Outdated Show resolved Hide resolved
R/validate_model_inputs.R Outdated Show resolved Hide resolved
tests/testthat/test-get_baseline_predictions.R Outdated Show resolved Hide resolved
@elray1
Copy link
Contributor

elray1 commented Sep 5, 2024

one other request -- could we add the github action to run unit tests on pull requests to this repo? to avoid adding more stuff to this PR, maybe good to add that in a separate PR we could merge first

@lshandross lshandross merged commit 1c86ee0 into main Sep 10, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write get_baseline_predictions function
2 participants