-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support specifying whether v4 output types are required via is_required
property
#161
Conversation
…equired values validation to succeed.
Merge branch 'main' into ak/v4-is_required/159 # Conflicts: # tests/testthat/test-expand_model_out_grid.R
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made some small suggestions to fix typos etc, but I think I will need you to walk me though this. At the moment, I'm having trouble holding all these concepts in my head. There are parts I think I understand and parts that I just really need better context for. For the parts that I think I understand, I have questions:
if an optional output_type is submitted, all output_type_ids must be present.
Is this a new thing or is this something that crops up with v4?
Addressed by assigning all output type ID values to the optional property when standardising if output type is
is_required
.
Can you provide a bit more detail about this? (Please let me know if I am being dense with this assessment:) from the code it seems that required and optional output types are concatenated. Doesn't this go against what @elray1 brought up in #105 (comment) (e.g. if someone has optional and required quantile outputs where the required are 0.01--0.99 and the optional are 0 and 1, then the concatenation invalidate the ascending values rule?
Finally, I discovered that if a hub has derived task IDs and they depend on task IDs with required values, derived_task_ids MUST be specified to avoid false errors when validating required values.
I noticed in the tests, we have derived_task_ids = get_derived_task_ids(hub_path)
. Could we not make this the default or add in a catch in the function that automatically checks for the derived task IDs if the hub is v4?
Co-authored-by: Zhian N. Kamvar <[email protected]>
Thanks for the review @zkamvar ! Hopefully I answered your questions below:
It is a new thing that is related to v4 and the whole discussion about how optional output type IDs are not allowed anymore.
This was the situation in the past when hubValidations/R/expand_model_out_grid.R Lines 667 to 681 in 4580b4f
This fix also allows consistent behaviour for pre and post v4 schema when using
That's the next task 😉: #155 |
#' Note that it is **necessary for `derived_task_ids` to be specified if any of | ||
#' the task IDs derived task IDs depend on have required values**. If this is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, I would not have done any better trying to roll up this complexity into a single sentence on the first shot. The phrase "the task IDs derived task IDs depend on" is giving me Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo vibes. Here's my attempt at a second round using "parent task IDs" instead of "task IDs (that) derived task IDs depend on".
#' Note that it is **necessary for `derived_task_ids` to be specified if any of | |
#' the task IDs derived task IDs depend on have required values**. If this is the | |
#' Note for derived task IDs: **if any of the parent task IDs are required, it is | |
#' necessary to specify `derived_task_ids`**. If this is the |
Still trying to understand this. I'm sorry I'm just not very smart. So if I understand things correctly, let's say a hub administrator has optional quantiles with and
Oh that's right, I forgot that I had found the concatenation in #105 (comment)
Again: I'm not that smart. I just don't understand why this argument is needed or how it would be used. Is this a v4-specific argument? Does this force output types that have |
In v3 that's the correct interpretation. In v4 the This makes the ordering clear but also clashes with the interpretation of That's why to use current infrastructure, when standardising an optional v4 output type (i.e. when
Yes and yes but it's pertinent only when creating grids of required values only. That's why the argument description states that the argument is "Useful for creating grids of required values for optional output types". As discussed, we have also now made it a rule that optional output type IDs are not allowed anymore and that if an optional output type is being submitted to, all output type ids need to be submitted.
It's already documented as a function argument and its effects shown in examples. There's no real need for more discussion currently. At the minute it's mainly for internal use. When we introduce it to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed IRL
This PR resolves #159 and specifically the need to handle the fact that whether an output type is required or not, is now handled by
is_required
BUT if an optionaloutput_type
is submitted, alloutput_type_ids
must be present.One issue addressed arises from the fact that for v4, all output_type_id values are stored under
required
even if the output type is not required. If not changed, alloutput_type_id
values will end up as required. Addressed by assigning all output type ID values to theoptional
property when standardising if output type isis_required
.An additional complication arises from the fact that, when an optional output type is having data submitted to, all
output_type_id
s of that output type need to be consideredrequired
, so no need for the above re-assignment of values. This requires knowledge of output types in the data, something thatexpand_model_out_grid()
has not required up to now. To allow for this situation, I've introduced aforce_output_types
logical argument. WhenTRUE
anyoutput_type
s specified inexpand_model_out_grid()
are forced to be required. In v4 checks, I also provide a vector of output types which includes any optional output types that exist in the data.Finally, I discovered that if a hub has derived task IDs and they depend on task IDs with required values,
derived_task_ids
MUST be specified to avoid false errors when validating required values. As such, I've added a note in the relevant vignette and function docs but suggest it's repeated in hubverse-org/hubDocs#193