Using cigar complexity to break ties in uninformative reads' best haplotypes #5359

davidbenjamin · 2018-10-25T18:55:28Z

@ldgauthier this finishes what we started in #4858 and is necessary for the pileup-calls-on-bamouts MC3 validation. The cause is the same, in that Pair-HMM has a tiny bias in favor of shorter haplotypes and thus it prefers deletion haplotypes when reads end inside STRs. In #4858 we broke near-ties in favor of the reference; this PR fixes the case where two alt haplotypes share a SNV and one of them has a spurious deletion.

One important sanity check was that when I set cigarTerm to zero in AssemblyBasedCallerUtils.java no tests broke. This means that the refactoring needed to set up the change didn't affect behavior.

I looked at most of the sites where PLs and/or DPs changed in the integration test vcfs and in every case the difference was from a fake deletion that this PR fixed. I also went through the diff of the bamouts in IGV and found the same thing.

Finally, the changes to test vcfs in GenotypeGVCFsIntegrationTest and GenomicsDBImporterIntegrationTest are a consequence of changes to the HaplotypeCallerIntegrationTest vcfs.

…lotyes

davidbenjamin · 2018-11-27T07:08:51Z

@ldgauthier Reminder that this is needed for the M2 paper evaluation. I forgot about this PR amid all the other MC3 stuff and would have reminded you earlier had I remembered.

ldgauthier · 2018-11-28T15:11:33Z

src/main/java/org/broadinstitute/hellbender/utils/genotyper/ReadLikelihoods.java

@@ -421,11 +422,12 @@ private void normalizeLikelihoodsPerRead(final double maximumBestAltLikelihoodDi
     * @param sampleIndex including sample index.
     * @param readIndex  target read index.
     *
-     * @param useReferenceIfUninformative
+     * @param priorities An array of allele priorities (lower score is higher priority) to be used, if present, to break ties for


I may be having a dyslexic moment, but is lower score higher priority? If we prefer the reference and more complex haplotypes have more negative scores, then higher scores should have higher priorities, yes? Later in the code you update the best allele if the candidatePriority > bestPriority.

Nice catch. The code is right and the comment is wrong.

codecov-io · 2018-11-28T18:23:11Z

Codecov Report

Merging #5359 into master will increase coverage by 0.099%.
The diff coverage is 87.5%.

@@               Coverage Diff               @@
##              master     #5359       +/-   ##
===============================================
+ Coverage     86.903%   87.002%   +0.099%     
- Complexity     30311     31188      +877     
===============================================
  Files           1849      1908       +59     
  Lines         140507    144073     +3566     
  Branches       15475     15937      +462     
===============================================
+ Hits          122105    125347     +3242     
- Misses         12793     12965      +172     
- Partials        5609      5761      +152

Impacted Files	Coverage Δ	Complexity Δ
...kers/haplotypecaller/AssemblyBasedCallerUtils.java	`76.563% <85.714%> (+0.14%)`	`35 <9> (+3)`	⬆️
...te/hellbender/utils/genotyper/ReadLikelihoods.java	`89.981% <87.879%> (-0.16%)`	`150 <7> (+7)`
...ls/funcotator/metadata/VcfFuncotationMetadata.java	`71.429% <0%> (-28.571%)`	`8% <0%> (+3%)`
...bender/utils/GATKProtectedVariantContextUtils.java	`67.005% <0%> (-3.325%)`	`66% <0%> (+1%)`
...der/tools/HaplotypeCallerSparkIntegrationTest.java	`58.73% <0%> (-3.035%)`	`12% <0%> (-1%)`
...ols/walkers/contamination/ContaminationRecord.java	`88.235% <0%> (-2.876%)`	`5% <0%> (-1%)`
...lbender/utils/read/SAMRecordToGATKReadAdapter.java	`91.606% <0%> (-2.095%)`	`144% <0%> (+6%)`
...nder/tools/funcotator/TranscriptSelectionMode.java	`89.72% <0%> (-1.869%)`	`1% <0%> (ø)`
...tools/funcotator/DataSourceFuncotationFactory.java	`86.957% <0%> (-1.68%)`	`17% <0%> (ø)`
...hellbender/tools/walkers/mutect/Mutect2Engine.java	`89.881% <0%> (-1.258%)`	`63% <0%> (+3%)`
... and 126 more

ldgauthier

👍
Thanks for the test change explanation and the painstaking effort of looking at changed likelihoods.

Using cigar complexity to break ties in uninformative Reads' best hap…

4b3af5c

…lotyes

davidbenjamin assigned ldgauthier Oct 25, 2018

davidbenjamin requested a review from ldgauthier October 25, 2018 18:55

davidbenjamin mentioned this pull request Oct 29, 2018

GATKProtectedVariantContextUtils.chooseAlleleForRead mishandles read ends #5325

Closed

ldgauthier reviewed Nov 28, 2018

View reviewed changes

fixed a javadoc

47c4001

ldgauthier approved these changes Nov 28, 2018

View reviewed changes

davidbenjamin merged commit 7226ad9 into master Nov 28, 2018

davidbenjamin deleted the db_best_alleles branch November 28, 2018 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using cigar complexity to break ties in uninformative reads' best haplotypes #5359

Using cigar complexity to break ties in uninformative reads' best haplotypes #5359

davidbenjamin commented Oct 25, 2018

davidbenjamin commented Nov 27, 2018

ldgauthier Nov 28, 2018

davidbenjamin Nov 28, 2018

davidbenjamin Nov 28, 2018

codecov-io commented Nov 28, 2018

ldgauthier left a comment

Using cigar complexity to break ties in uninformative reads' best haplotypes #5359

Using cigar complexity to break ties in uninformative reads' best haplotypes #5359

Conversation

davidbenjamin commented Oct 25, 2018

davidbenjamin commented Nov 27, 2018

ldgauthier Nov 28, 2018

Choose a reason for hiding this comment

davidbenjamin Nov 28, 2018

Choose a reason for hiding this comment

davidbenjamin Nov 28, 2018

Choose a reason for hiding this comment

codecov-io commented Nov 28, 2018

Codecov Report

ldgauthier left a comment

Choose a reason for hiding this comment