Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added separate allele-count thresholds for the normal and tumor in ModelSegments. #5556

Merged
merged 1 commit into from
Jan 10, 2019

Conversation

samuelklee
Copy link
Contributor

@samuelklee samuelklee commented Jan 2, 2019

Also fixed some minor style issues in argument variable names and the WDL.

This should help recover some deletions and might possibly clear up some issues with MAF estimation when the number of hets is small. @LeeTL1220 can you run on some test cases to check the effect? (Note that the changes to fix estimation of the posterior widths, which will in turn affect similar-segment smoothing, are in another branch; we should test those changes as well.)

Note that the default threshold of zero for the tumor in matched-normal mode should ensure that the sites genotyped as het should always match in the tumor and the normal. (This will ultimately make multisample segmentation, as enabled by #5524, more straightforward.) There was previously a check for this condition in the integration test; however, it wasn't actually activated by the test data. I could modify the test data to add a proper regression test, but since these test files are generated by running another tool on a test BAM in the repo, this could be misleading. I'm OK with punting in this case.

@jonn-smith do you mind reviewing, since this resulted from your turn as liaison? Should be super quick. Thanks again for raising the issue!

@samuelklee samuelklee requested a review from jonn-smith January 2, 2019 20:36
@samuelklee samuelklee changed the title Added separate allele-count thresholds for the normal and tumor in ModelSegments and fixed logic for genotyping of sites. Added separate allele-count thresholds for the normal and tumor in ModelSegments. Jan 2, 2019
@codecov-io
Copy link

Codecov Report

Merging #5556 into master will increase coverage by <.001%.
The diff coverage is 87.5%.

@@               Coverage Diff               @@
##              master     #5556       +/-   ##
===============================================
+ Coverage     87.082%   87.082%   +<.001%     
+ Complexity     31250     31249        -1     
===============================================
  Files           1915      1915               
  Lines         144178    144180        +2     
  Branches       15910     15910               
===============================================
+ Hits          125553    125555        +2     
  Misses         12839     12839               
  Partials        5786      5786
Impacted Files Coverage Δ Complexity Δ
...ute/hellbender/tools/copynumber/ModelSegments.java 98.077% <87.5%> (-0.467%) 45 <3> (-1)
...nder/utils/runtime/StreamingProcessController.java 67.773% <0%> (+0.474%) 33% <0%> (ø) ⬇️

@samuelklee
Copy link
Contributor Author

@LeeTL1220 @jonn-smith apologies for the blips in getting tests to pass on Travis, but I think this branch should be OK now.

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor question, then if tests pass and @LeeTL1220 validates it (as requested), feel free to merge.

@@ -619,12 +630,14 @@ private AllelicCountCollection genotypeHets(final SampleLocatableMetadata metada

logger.info("Genotyping heterozygous sites from available allelic counts...");

AllelicCountCollection filteredAllelicCounts = allelicCounts;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason you moved the declaration/definition up here? You end up setting it a couple lines later, so it doesn't seem to make a difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a very minor matter of style. Now all subsequent transformations after the initial declaration operate on the new variable, so any transformations added later will have an identical format. (Really what happened is that I experimented with adding filtering steps and changing their order, but got bit by not noticing I had inadvertently reverted to the original counts in a later step due to a careless copy and paste...)

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hit wrong button.

One minor question, then if tests pass and @LeeTL1220 validates it (as requested), feel free to merge.

Copy link
Contributor

@LeeTL1220 LeeTL1220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samuelklee I only have a curiosity question. If you can answer @jonn-smith 's question, I am fine with this being merged.

--number-of-burn-in-samples-copy-ratio ${default=50 num_burn_in_copy_ratio} \
--number-of-samples-allele-fraction ${default=100 num_samples_allele_fraction} \
--number-of-burn-in-samples-allele-fraction ${default=50 num_burn_in_allele_fraction} \
--number-of-samples-copy-ratio ${default="100" num_samples_copy_ratio} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I mind, but curiosity: Were the quotes necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just another matter of style, I think these were the only numbers missing quotes in the CNV WDLs.

@samuelklee
Copy link
Contributor Author

Thanks @jonn-smith and @LeeTL1220. Will go ahead and merge, but I think we should still check the effect on MAF estimates, etc. in the evaluation/real runs at some point in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants