Support specifying whether v4 output types are required via `is_required` property #161

annakrystalli · 2024-11-14T15:38:01Z

This PR resolves #159 and specifically the need to handle the fact that whether an output type is required or not, is now handled by is_required BUT if an optional output_type is submitted, all output_type_ids must be present.

One issue addressed arises from the fact that for v4, all output_type_id values are stored under required even if the output type is not required. If not changed, all output_type_id values will end up as required. Addressed by assigning all output type ID values to the optional property when standardising if output type is is_required.

An additional complication arises from the fact that, when an optional output type is having data submitted to, all output_type_ids of that output type need to be considered required, so no need for the above re-assignment of values. This requires knowledge of output types in the data, something that expand_model_out_grid() has not required up to now. To allow for this situation, I've introduced a force_output_types logical argument. When TRUE any output_types specified in expand_model_out_grid() are forced to be required. In v4 checks, I also provide a vector of output types which includes any optional output types that exist in the data.

Finally, I discovered that if a hub has derived task IDs and they depend on task IDs with required values, derived_task_ids MUST be specified to avoid false errors when validating required values. As such, I've added a note in the relevant vignette and function docs but suggest it's repeated in hubverse-org/hubDocs#193

…rid.

…equired values validation to succeed.

Merge branch 'main' into ak/v4-is_required/159 # Conflicts: # tests/testthat/test-expand_model_out_grid.R

github-actions · 2024-11-14T15:45:41Z

🚀 Deployed on https://673c4368e2d530ed3fd80b40--hubvalidations-pr-previews.netlify.app

zkamvar

I've made some small suggestions to fix typos etc, but I think I will need you to walk me though this. At the moment, I'm having trouble holding all these concepts in my head. There are parts I think I understand and parts that I just really need better context for. For the parts that I think I understand, I have questions:

if an optional output_type is submitted, all output_type_ids must be present.

Is this a new thing or is this something that crops up with v4?

Addressed by assigning all output type ID values to the optional property when standardising if output type is is_required.

Can you provide a bit more detail about this? (Please let me know if I am being dense with this assessment:) from the code it seems that required and optional output types are concatenated. Doesn't this go against what @elray1 brought up in #105 (comment) (e.g. if someone has optional and required quantile outputs where the required are 0.01--0.99 and the optional are 0 and 1, then the concatenation invalidate the ascending values rule?

Finally, I discovered that if a hub has derived task IDs and they depend on task IDs with required values, derived_task_ids MUST be specified to avoid false errors when validating required values.

I noticed in the tests, we have derived_task_ids = get_derived_task_ids(hub_path). Could we not make this the default or add in a catch in the function that automatically checks for the derived task IDs if the hub is v4?

R/expand_model_out_grid.R

tests/testthat/test-expand_model_out_grid.R

Co-authored-by: Zhian N. Kamvar <[email protected]>

annakrystalli · 2024-11-15T09:10:22Z

Thanks for the review @zkamvar ! Hopefully I answered your questions below:

if an optional output_type is submitted, all output_type_ids must be present.

Is this a new thing or is this something that crops up with v4?

It is a new thing that is related to v4 and the whole discussion about how optional output type IDs are not allowed anymore.
is_required controls whether an output type ID as a whole is required or not but if someone submits an optional output type, they have to submit all output_type_ids.

Addressed by assigning all output type ID values to the optional property when standardising if output type is is_required.

Can you provide a bit more detail about this? (Please let me know if I am being dense with this assessment:) from the code it seems that required and optional output types are concatenated. Doesn't this go against what @elray1 brought up in #105 (comment) (e.g. if someone has optional and required quantile outputs where the required are 0.01--0.99 and the optional are 0 and 1, then the concatenation invalidate the ascending values rule?

This was the situation in the past when optional output type ID values were allowed and the whole issue v4 is addressing. In v4 we only have a single required output_type_id. To enable us to use the existing infrastructure, when we process v4 output type IDs we standardise them to a list with both required and optional properties. Whether we assign the values to required or optional depends on the value of is_required. The other property however remains as NULL. We concatenate for back compatibility (this was already happening when unlisting to get all output_type_id values), but we're effectively concatenating with a NULL vector post v4 so nothing changes for the original values. See

hubValidations/R/expand_model_out_grid.R

Lines 667 to 681 in 4580b4f

    
           as_required <- function(output_type_ids) { 
        
             opt_values <- output_type_ids$optional 
        
             req_values <- output_type_ids$required 
        
             values <- c(req_values, opt_values) 
        
             list(required = values, optional = NULL) 
        
           } 
        
           as_optional <- function(output_type_ids) { 
        
             opt_values <- output_type_ids$optional 
        
             req_values <- output_type_ids$required 
        
             values <- c(req_values, opt_values) 
        
             list(required = NULL, optional = values) 
        
           }

This fix also allows consistent behaviour for pre and post v4 schema when using force_output_types. When that's TRUE, all output_type_ids become required regardless of which version of schema we are using.

I noticed in the tests, we have derived_task_ids = get_derived_task_ids(hub_path). Could we not make this the default or add in a catch in the function that automatically checks for the derived task IDs if the hub is v4?

That's the next task 😉: #155

zkamvar · 2024-11-15T15:10:28Z

R/check_tbl_values_required.R

+#' Note that it is **necessary for `derived_task_ids` to be specified if any of
+#' the task IDs derived task IDs depend on have required values**. If this is the


To be fair, I would not have done any better trying to roll up this complexity into a single sentence on the first shot. The phrase "the task IDs derived task IDs depend on" is giving me Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo vibes. Here's my attempt at a second round using "parent task IDs" instead of "task IDs (that) derived task IDs depend on".

Suggested change

#' Note that it is **necessary for `derived_task_ids` to be specified if any of

#' the task IDs derived task IDs depend on have required values**. If this is the

#' Note for derived task IDs: **if any of the parent task IDs are required, it is

#' necessary to specify `derived_task_ids`**. If this is the

vignettes/articles/validate-pr.Rmd

R/validate_model_data.R

R/validate_pr.R

R/validate_submission.R

zkamvar · 2024-11-15T17:06:26Z

It is a new thing that is related to v4 and the whole discussion about how optional output type IDs are not allowed anymore. is_required controls whether an output type ID as a whole is required or not but if someone submits an optional output type, they have to submit all output_type_ids.

Still trying to understand this. I'm sorry I'm just not very smart.

So if I understand things correctly, let's say a hub administrator has optional quantiles with and output_type_id of [0, 0.5, 1]. In v3, a modeler would be able to submit any subset of these output_type_ids (e.g. [0, 1]), but in v4, a modeler needs to submit all of them? Is that what this means?

We concatenate for back compatibility (this was already happening when unlisting to get all output_type_id values), but we're effectively concatenating with a NULL vector post v4 so nothing changes for the original values.

Oh that's right, I forgot that I had found the concatenation in #105 (comment)

This fix also allows consistent behaviour for pre and post v4 schema when using force_output_types. When that's TRUE, all output_type_ids become required regardless of which version of schema we are using.

Again: I'm not that smart. I just don't understand why this argument is needed or how it would be used. Is this a v4-specific argument? Does this force output types that have is_required: false to be required? What are the consequences of not using this argument? Do hub admins need to be aware of this?

annakrystalli · 2024-11-18T09:06:23Z

So if I understand things correctly, let's say a hub administrator has optional quantiles with and output_type_id of [0, 0.5, 1]. In v3, a modeler would be able to submit any subset of these output_type_ids (e.g. [0, 1]), but in v4, a modeler needs to submit all of them? Is that what this means?

In v3 that's the correct interpretation. In v4 the optional property doesn't even exist. This was a big driver of v4 changes resulting from the implementation question I raised here: #105 (comment)
in v4 all values are provided in the required property.

This makes the ordering clear but also clashes with the interpretation of required values in current infrastructure when dealing with an optional output type.

That's why to use current infrastructure, when standardising an optional v4 output type (i.e. when is_required: false we need to create and shift the values to the optional property.

I just don't understand why this argument is needed or how it would be used. Is this a v4-specific argument? Does this force output types that have is_required: false to be required?

Yes and yes but it's pertinent only when creating grids of required values only. That's why the argument description states that the argument is "Useful for creating grids of required values for optional output types". As discussed, we have also now made it a rule that optional output type IDs are not allowed anymore and that if an optional output type is being submitted to, all output type ids need to be submitted. force_output_type allows us to do just that when requesting required_vals_only (which is the grid used when validating that all required values have been submitted). It tells expand_model_out_grid() to treat any output types requested as required (regardless of whether is_required is true or not). This allows us to create a required values grid that includes all output type IDs of a requested optional output type ID. Without force_output_type, there was no way to take advantage of the existing infrastructure, given the standardisation processing of optional output types described above as their output type ids would always be treated as optional. I feel examining or even running some of the examples locally would help with understanding: https://67370e8a57170d4c7f2ac043--hubvalidations-pr-previews.netlify.app/reference/expand_model_out_grid

Do hub admins need to be aware of this?

It's already documented as a function argument and its effects shown in examples. There's no real need for more discussion currently. At the minute it's mainly for internal use. When we introduce it to submission_tmpl() we can provide a bit more detail so that modellers are aware of how and when they should use it if they want a required values template that includes an optional output type.

zkamvar

Discussed IRL

annakrystalli added 16 commits November 13, 2024 15:57

Use is_required to determine v4 required values in expand_model_out_g…

16ca952

…rid.

support v4 required values validation in optional output types

5f529b0

substitute rnd with round

fef4e3a

use setequal instead of all.equal

64796e0

add remove warning suppression to do

9fd5fed

refactor test

ab83e0a

refactor std_output_type_ids

ec09359

refactor join_tbl_to_model_task

e4115de

Use round instead of rnd

952f27c

Appease lintr

7fa23e7

Add v4 tests

64e19e9

Add Note about condition when derived task id must be specified for r…

35bd7e8

…equired values validation to succeed.

merge main

b1cff8e

Merge branch 'main' into ak/v4-is_required/159 # Conflicts: # tests/testthat/test-expand_model_out_grid.R

Add to NEWS

491922f

Add note re derived_task_ids to fn docs

81912e1

Fix roxygen warnings

84c3c6e

annakrystalli requested a review from zkamvar November 14, 2024 15:44

zkamvar reviewed Nov 14, 2024

View reviewed changes

annakrystalli and others added 4 commits November 15, 2024 10:50

Update R/expand_model_out_grid.R

e99aa7e

Co-authored-by: Zhian N. Kamvar <[email protected]>

rnd to round

90b8712

remove isTRUE

810690b

refactor test

4580b4f

zkamvar reviewed Nov 15, 2024

View reviewed changes

vignettes/articles/validate-pr.Rmd Show resolved Hide resolved

zkamvar reviewed Nov 15, 2024

View reviewed changes

R/validate_model_data.R Show resolved Hide resolved

zkamvar reviewed Nov 15, 2024

View reviewed changes

R/validate_pr.R Show resolved Hide resolved

zkamvar reviewed Nov 15, 2024

View reviewed changes

R/validate_submission.R Show resolved Hide resolved

annakrystalli added 2 commits November 19, 2024 09:45

Add explanatory comment

46f5e89

Document

e6ed1a0

zkamvar self-requested a review November 19, 2024 15:08

zkamvar approved these changes Nov 19, 2024

View reviewed changes

annakrystalli merged commit 9a19893 into main Nov 19, 2024
8 checks passed

annakrystalli deleted the ak/v4-is_required/159 branch November 19, 2024 15:20

zkamvar mentioned this pull request Nov 19, 2024

[internal] separate functions concerned with standard output type formats into separate file #163

Open

zkamvar mentioned this pull request Dec 5, 2024

add derived task ID language hubverse-org/hubDocs#217

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support specifying whether v4 output types are required via `is_required` property #161

Support specifying whether v4 output types are required via `is_required` property #161

annakrystalli commented Nov 14, 2024 •

edited by zkamvar

Loading

github-actions bot commented Nov 14, 2024 •

edited

Loading

zkamvar left a comment

annakrystalli commented Nov 15, 2024 •

edited

Loading

zkamvar Nov 15, 2024

zkamvar commented Nov 15, 2024

annakrystalli commented Nov 18, 2024 •

edited

Loading

zkamvar left a comment

		#' Note that it is **necessary for `derived_task_ids` to be specified if any of
		#' the task IDs derived task IDs depend on have required values**. If this is the

Support specifying whether v4 output types are required via is_required property #161

Support specifying whether v4 output types are required via is_required property #161

Conversation

annakrystalli commented Nov 14, 2024 • edited by zkamvar Loading

github-actions bot commented Nov 14, 2024 • edited Loading

zkamvar left a comment

Choose a reason for hiding this comment

annakrystalli commented Nov 15, 2024 • edited Loading

zkamvar Nov 15, 2024

Choose a reason for hiding this comment

zkamvar commented Nov 15, 2024

annakrystalli commented Nov 18, 2024 • edited Loading

zkamvar left a comment

Choose a reason for hiding this comment

Support specifying whether v4 output types are required via `is_required` property #161

Support specifying whether v4 output types are required via `is_required` property #161

annakrystalli commented Nov 14, 2024 •

edited by zkamvar

Loading

github-actions bot commented Nov 14, 2024 •

edited

Loading

annakrystalli commented Nov 15, 2024 •

edited

Loading

annakrystalli commented Nov 18, 2024 •

edited

Loading