-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Feature/handle coarse spls #99
Conversation
…ormative messaging.
…ad of empty named list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good, thank you! I really like the way you extract the compound id set from the submission.
However, I have a question. If I test this version of the validation with a submission file with a coarser compound id set, I still get an error. Is that expected? As I understand, it's still a work in progress but wanted to be sure it's expected.
The error is related to the function check_tbl_spl_compound_tid ()
, for example:
2024-04-28-JHU_UNC-flepiMoP_subset.parquet: Each sample compound task ID does not contain single,
unique value. Samples "1", "100", "103", "106", "112", "119", "12", "120", "123", "125", "133",
"135", "138", "141", "142", "145", "148", "152", …, "90", and "94" do not contain unique compound
task ID combinations. See `errors` attribute for details.
@LucieContamin thanks for taking a look and testing it out! And yes, the validation of coarser samples have not propagated to all tests yet. This is step 1, then #100 is step 2 and once we are happy with both PRs I will go ahead and combine the functionality in the two PRs to allow me to propagate the creation of coarser sample compound idxs for the rest of the the tests. |
So I will not merge this into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good to me!
…t in sample downstream checks
…ids in compound_taskid_set
Fix test files
Propagate detected compound_taskid_set throughout rest of spl checks
This is the first part of work related to allowing for validation of coarser samples than those specified in a modeling tasks
compound_taskid_set
(#88), discussed in this section of hubDocs.The PR adds a new check function,
check_tbl_spl_compound_taskid_set()
tovalidate_model_data()
which checks that sample compound task id sets for each modeling task is a single unique set that matches or is coarser than the expected set defined in the config. If the check fails, it will cause an early return ofvalidate_model_data()
as further validation of sample data is not possible.To perform the above check, in this PR I'm also adding some utilities for detecting the compound task ID structure of samples in a submission file. This functionality will then be added to the rest of the sample checks so that they all use the
compound_taskid_set
detected from the samples rather than the fixed set defined in the config.There were a couple of complications in implementing the
compound_taskid_set
detection discussed in #88 ( in particular #88 (comment) & #88 (comment)) and I've implemented the solution described in #88 (comment)The PR also the message in
skip_v3_spl_check()
so that the function that actually called the function and is being skipped is recorded in the output.