-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added separate allele-count thresholds for the normal and tumor in ModelSegments. #5556
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5556 +/- ##
===============================================
+ Coverage 87.082% 87.082% +<.001%
+ Complexity 31250 31249 -1
===============================================
Files 1915 1915
Lines 144178 144180 +2
Branches 15910 15910
===============================================
+ Hits 125553 125555 +2
Misses 12839 12839
Partials 5786 5786
|
@LeeTL1220 @jonn-smith apologies for the blips in getting tests to pass on Travis, but I think this branch should be OK now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor question, then if tests pass and @LeeTL1220 validates it (as requested), feel free to merge.
@@ -619,12 +630,14 @@ private AllelicCountCollection genotypeHets(final SampleLocatableMetadata metada | |||
|
|||
logger.info("Genotyping heterozygous sites from available allelic counts..."); | |||
|
|||
AllelicCountCollection filteredAllelicCounts = allelicCounts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason you moved the declaration/definition up here? You end up setting it a couple lines later, so it doesn't seem to make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a very minor matter of style. Now all subsequent transformations after the initial declaration operate on the new variable, so any transformations added later will have an identical format. (Really what happened is that I experimented with adding filtering steps and changing their order, but got bit by not noticing I had inadvertently reverted to the original counts in a later step due to a careless copy and paste...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hit wrong button.
One minor question, then if tests pass and @LeeTL1220 validates it (as requested), feel free to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samuelklee I only have a curiosity question. If you can answer @jonn-smith 's question, I am fine with this being merged.
--number-of-burn-in-samples-copy-ratio ${default=50 num_burn_in_copy_ratio} \ | ||
--number-of-samples-allele-fraction ${default=100 num_samples_allele_fraction} \ | ||
--number-of-burn-in-samples-allele-fraction ${default=50 num_burn_in_allele_fraction} \ | ||
--number-of-samples-copy-ratio ${default="100" num_samples_copy_ratio} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I mind, but curiosity: Were the quotes necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just another matter of style, I think these were the only numbers missing quotes in the CNV WDLs.
Thanks @jonn-smith and @LeeTL1220. Will go ahead and merge, but I think we should still check the effect on MAF estimates, etc. in the evaluation/real runs at some point in the future. |
Also fixed some minor style issues in argument variable names and the WDL.
This should help recover some deletions and might possibly clear up some issues with MAF estimation when the number of hets is small. @LeeTL1220 can you run on some test cases to check the effect? (Note that the changes to fix estimation of the posterior widths, which will in turn affect similar-segment smoothing, are in another branch; we should test those changes as well.)
Note that the default threshold of zero for the tumor in matched-normal mode should ensure that the sites genotyped as het should always match in the tumor and the normal. (This will ultimately make multisample segmentation, as enabled by #5524, more straightforward.) There was previously a check for this condition in the integration test; however, it wasn't actually activated by the test data. I could modify the test data to add a proper regression test, but since these test files are generated by running another tool on a test BAM in the repo, this could be misleading. I'm OK with punting in this case.
@jonn-smith do you mind reviewing, since this resulted from your turn as liaison? Should be super quick. Thanks again for raising the issue!