Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support specifying whether v4 output types are required via is_required property #161

Merged
merged 22 commits into from
Nov 19, 2024

Conversation

annakrystalli
Copy link
Member

@annakrystalli annakrystalli commented Nov 14, 2024

This PR resolves #159 and specifically the need to handle the fact that whether an output type is required or not, is now handled by is_required BUT if an optional output_type is submitted, all output_type_ids must be present.

One issue addressed arises from the fact that for v4, all output_type_id values are stored under required even if the output type is not required. If not changed, all output_type_id values will end up as required. Addressed by assigning all output type ID values to the optional property when standardising if output type is is_required.

An additional complication arises from the fact that, when an optional output type is having data submitted to, all output_type_ids of that output type need to be considered required, so no need for the above re-assignment of values. This requires knowledge of output types in the data, something that expand_model_out_grid() has not required up to now. To allow for this situation, I've introduced a force_output_types logical argument. When TRUE any output_types specified in expand_model_out_grid() are forced to be required. In v4 checks, I also provide a vector of output types which includes any optional output types that exist in the data.

Finally, I discovered that if a hub has derived task IDs and they depend on task IDs with required values, derived_task_ids MUST be specified to avoid false errors when validating required values. As such, I've added a note in the relevant vignette and function docs but suggest it's repeated in hubverse-org/hubDocs#193

Copy link

github-actions bot commented Nov 14, 2024

Copy link
Member

@zkamvar zkamvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some small suggestions to fix typos etc, but I think I will need you to walk me though this. At the moment, I'm having trouble holding all these concepts in my head. There are parts I think I understand and parts that I just really need better context for. For the parts that I think I understand, I have questions:

if an optional output_type is submitted, all output_type_ids must be present.

Is this a new thing or is this something that crops up with v4?

Addressed by assigning all output type ID values to the optional property when standardising if output type is is_required.

Can you provide a bit more detail about this? (Please let me know if I am being dense with this assessment:) from the code it seems that required and optional output types are concatenated. Doesn't this go against what @elray1 brought up in #105 (comment) (e.g. if someone has optional and required quantile outputs where the required are 0.01--0.99 and the optional are 0 and 1, then the concatenation invalidate the ascending values rule?

Finally, I discovered that if a hub has derived task IDs and they depend on task IDs with required values, derived_task_ids MUST be specified to avoid false errors when validating required values.

I noticed in the tests, we have derived_task_ids = get_derived_task_ids(hub_path). Could we not make this the default or add in a catch in the function that automatically checks for the derived task IDs if the hub is v4?

R/expand_model_out_grid.R Outdated Show resolved Hide resolved
R/expand_model_out_grid.R Outdated Show resolved Hide resolved
R/expand_model_out_grid.R Outdated Show resolved Hide resolved
R/expand_model_out_grid.R Outdated Show resolved Hide resolved
tests/testthat/test-expand_model_out_grid.R Outdated Show resolved Hide resolved
@annakrystalli
Copy link
Member Author

annakrystalli commented Nov 15, 2024

Thanks for the review @zkamvar ! Hopefully I answered your questions below:

if an optional output_type is submitted, all output_type_ids must be present.

Is this a new thing or is this something that crops up with v4?

It is a new thing that is related to v4 and the whole discussion about how optional output type IDs are not allowed anymore.
is_required controls whether an output type ID as a whole is required or not but if someone submits an optional output type, they have to submit all output_type_ids.

Addressed by assigning all output type ID values to the optional property when standardising if output type is is_required.

Can you provide a bit more detail about this? (Please let me know if I am being dense with this assessment:) from the code it seems that required and optional output types are concatenated. Doesn't this go against what @elray1 brought up in #105 (comment) (e.g. if someone has optional and required quantile outputs where the required are 0.01--0.99 and the optional are 0 and 1, then the concatenation invalidate the ascending values rule?

This was the situation in the past when optional output type ID values were allowed and the whole issue v4 is addressing. In v4 we only have a single required output_type_id. To enable us to use the existing infrastructure, when we process v4 output type IDs we standardise them to a list with both required and optional properties. Whether we assign the values to required or optional depends on the value of is_required. The other property however remains as NULL. We concatenate for back compatibility (this was already happening when unlisting to get all output_type_id values), but we're effectively concatenating with a NULL vector post v4 so nothing changes for the original values. See

as_required <- function(output_type_ids) {
opt_values <- output_type_ids$optional
req_values <- output_type_ids$required
values <- c(req_values, opt_values)
list(required = values, optional = NULL)
}
as_optional <- function(output_type_ids) {
opt_values <- output_type_ids$optional
req_values <- output_type_ids$required
values <- c(req_values, opt_values)
list(required = NULL, optional = values)
}

This fix also allows consistent behaviour for pre and post v4 schema when using force_output_types. When that's TRUE, all output_type_ids become required regardless of which version of schema we are using.

I noticed in the tests, we have derived_task_ids = get_derived_task_ids(hub_path). Could we not make this the default or add in a catch in the function that automatically checks for the derived task IDs if the hub is v4?

That's the next task 😉: #155

Comment on lines +9 to +10
#' Note that it is **necessary for `derived_task_ids` to be specified if any of
#' the task IDs derived task IDs depend on have required values**. If this is the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, I would not have done any better trying to roll up this complexity into a single sentence on the first shot. The phrase "the task IDs derived task IDs depend on" is giving me Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo vibes. Here's my attempt at a second round using "parent task IDs" instead of "task IDs (that) derived task IDs depend on".

Suggested change
#' Note that it is **necessary for `derived_task_ids` to be specified if any of
#' the task IDs derived task IDs depend on have required values**. If this is the
#' Note for derived task IDs: **if any of the parent task IDs are required, it is
#' necessary to specify `derived_task_ids`**. If this is the

@zkamvar
Copy link
Member

zkamvar commented Nov 15, 2024

It is a new thing that is related to v4 and the whole discussion about how optional output type IDs are not allowed anymore. is_required controls whether an output type ID as a whole is required or not but if someone submits an optional output type, they have to submit all output_type_ids.

Still trying to understand this. I'm sorry I'm just not very smart.

So if I understand things correctly, let's say a hub administrator has optional quantiles with and output_type_id of [0, 0.5, 1]. In v3, a modeler would be able to submit any subset of these output_type_ids (e.g. [0, 1]), but in v4, a modeler needs to submit all of them? Is that what this means?

We concatenate for back compatibility (this was already happening when unlisting to get all output_type_id values), but we're effectively concatenating with a NULL vector post v4 so nothing changes for the original values.

Oh that's right, I forgot that I had found the concatenation in #105 (comment)

This fix also allows consistent behaviour for pre and post v4 schema when using force_output_types. When that's TRUE, all output_type_ids become required regardless of which version of schema we are using.

Again: I'm not that smart. I just don't understand why this argument is needed or how it would be used. Is this a v4-specific argument? Does this force output types that have is_required: false to be required? What are the consequences of not using this argument? Do hub admins need to be aware of this?

@annakrystalli
Copy link
Member Author

annakrystalli commented Nov 18, 2024

So if I understand things correctly, let's say a hub administrator has optional quantiles with and output_type_id of [0, 0.5, 1]. In v3, a modeler would be able to submit any subset of these output_type_ids (e.g. [0, 1]), but in v4, a modeler needs to submit all of them? Is that what this means?

In v3 that's the correct interpretation. In v4 the optional property doesn't even exist. This was a big driver of v4 changes resulting from the implementation question I raised here: #105 (comment)
in v4 all values are provided in the required property.

This makes the ordering clear but also clashes with the interpretation of required values in current infrastructure when dealing with an optional output type.

That's why to use current infrastructure, when standardising an optional v4 output type (i.e. when is_required: false we need to create and shift the values to the optional property.

I just don't understand why this argument is needed or how it would be used. Is this a v4-specific argument? Does this force output types that have is_required: false to be required?

Yes and yes but it's pertinent only when creating grids of required values only. That's why the argument description states that the argument is "Useful for creating grids of required values for optional output types". As discussed, we have also now made it a rule that optional output type IDs are not allowed anymore and that if an optional output type is being submitted to, all output type ids need to be submitted. force_output_type allows us to do just that when requesting required_vals_only (which is the grid used when validating that all required values have been submitted). It tells expand_model_out_grid() to treat any output types requested as required (regardless of whether is_required is true or not). This allows us to create a required values grid that includes all output type IDs of a requested optional output type ID. Without force_output_type, there was no way to take advantage of the existing infrastructure, given the standardisation processing of optional output types described above as their output type ids would always be treated as optional. I feel examining or even running some of the examples locally would help with understanding: https://67370e8a57170d4c7f2ac043--hubvalidations-pr-previews.netlify.app/reference/expand_model_out_grid

Do hub admins need to be aware of this?

It's already documented as a function argument and its effects shown in examples. There's no real need for more discussion currently. At the minute it's mainly for internal use. When we introduce it to submission_tmpl() we can provide a bit more detail so that modellers are aware of how and when they should use it if they want a required values template that includes an optional output type.

@zkamvar zkamvar self-requested a review November 19, 2024 15:08
Copy link
Member

@zkamvar zkamvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed IRL

@annakrystalli annakrystalli merged commit 9a19893 into main Nov 19, 2024
8 checks passed
@annakrystalli annakrystalli deleted the ak/v4-is_required/159 branch November 19, 2024 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Use is_required to determine required values in expand_model_out_grid() v4 output type
2 participants