Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Feature/handle coarse spls #99

Merged
merged 60 commits into from
Aug 13, 2024
Merged

Conversation

annakrystalli
Copy link
Member

@annakrystalli annakrystalli commented Jul 11, 2024

This is the first part of work related to allowing for validation of coarser samples than those specified in a modeling tasks compound_taskid_set (#88), discussed in this section of hubDocs.

The PR adds a new check function, check_tbl_spl_compound_taskid_set() to validate_model_data() which checks that sample compound task id sets for each modeling task is a single unique set that matches or is coarser than the expected set defined in the config. If the check fails, it will cause an early return of validate_model_data() as further validation of sample data is not possible.

To perform the above check, in this PR I'm also adding some utilities for detecting the compound task ID structure of samples in a submission file. This functionality will then be added to the rest of the sample checks so that they all use the compound_taskid_set detected from the samples rather than the fixed set defined in the config.

There were a couple of complications in implementing the compound_taskid_set detection discussed in #88 ( in particular #88 (comment) & #88 (comment)) and I've implemented the solution described in #88 (comment)

The PR also the message in skip_v3_spl_check() so that the function that actually called the function and is being skipped is recorded in the output.

@annakrystalli annakrystalli self-assigned this Jul 11, 2024
@annakrystalli annakrystalli linked an issue Jul 11, 2024 that may be closed by this pull request
Copy link

github-actions bot commented Jul 11, 2024

@elray1 elray1 self-requested a review July 11, 2024 14:03
@annakrystalli annakrystalli requested review from elray1 and removed request for elray1 July 11, 2024 14:03
Copy link
Contributor

@LucieContamin LucieContamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, thank you! I really like the way you extract the compound id set from the submission.

However, I have a question. If I test this version of the validation with a submission file with a coarser compound id set, I still get an error. Is that expected? As I understand, it's still a work in progress but wanted to be sure it's expected.

The error is related to the function check_tbl_spl_compound_tid (), for example:

2024-04-28-JHU_UNC-flepiMoP_subset.parquet: Each sample compound task ID does not contain single,
  unique value.  Samples "1", "100", "103", "106", "112", "119", "12", "120", "123", "125", "133",
  "135", "138", "141", "142", "145", "148", "152", …, "90", and "94" do not contain unique compound
  task ID combinations. See `errors` attribute for details.

R/compound_taskid-utils.R Show resolved Hide resolved
@annakrystalli
Copy link
Member Author

annakrystalli commented Jul 12, 2024

@LucieContamin thanks for taking a look and testing it out! And yes, the validation of coarser samples have not propagated to all tests yet. This is step 1, then #100 is step 2 and once we are happy with both PRs I will go ahead and combine the functionality in the two PRs to allow me to propagate the creation of coarser sample compound idxs for the rest of the the tests.

@annakrystalli
Copy link
Member Author

So I will not merge this into main untill all functionality has been merged into this branch.

@LucieContamin LucieContamin self-requested a review July 12, 2024 16:31
Copy link
Contributor

@LucieContamin LucieContamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good to me!

@annakrystalli annakrystalli merged commit ed3e3cf into main Aug 13, 2024
8 checks passed
@annakrystalli annakrystalli deleted the feature/handle-coarse-spls branch August 13, 2024 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Validate coarse-grained samples
3 participants