-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cap likelihoods symmetrically #6196
Conversation
@@ -206,7 +206,7 @@ public static CachingIndexedFastaSequenceFile createReferenceReader(final String | |||
* @return never {@code null}. | |||
*/ | |||
public static ReadLikelihoodCalculationEngine createLikelihoodCalculationEngine(final LikelihoodEngineArgumentCollection likelihoodArgs) { | |||
final double log10GlobalReadMismappingRate = likelihoodArgs.phredScaledGlobalReadMismappingRate < 0 ? - Double.MAX_VALUE | |||
final double log10GlobalReadMismappingRate = likelihoodArgs.phredScaledGlobalReadMismappingRate < 0 ? Double.NEGATIVE_INFINITY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs say you can turn off the cap by setting this value negative, but it wasn't working before because subsequent methods were looking for -Inf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to @ldgauthier
@@ -352,7 +352,8 @@ public void normalizeLikelihoods(final double maximumLikelihoodDifferenceCap) { | |||
private void normalizeLikelihoodsPerEvidence(final double maximumBestAltLikelihoodDifference, | |||
final double[][] sampleValues, final int sampleIndex, final int evidenceIndex) { | |||
|
|||
final BestAllele bestAlternativeAllele = searchBestAllele(sampleIndex,evidenceIndex,false); | |||
//allow the best "alternative" allele to be the reference because asymmetry leads to strange artifacts like het calls with >90% alt reads | |||
final BestAllele bestAlternativeAllele = searchBestAllele(sampleIndex,evidenceIndex,true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we then just call this bestAllele
?
}; | ||
runCommandLine(argsNoMaxAlternateAlleles); | ||
|
||
List<VariantContext> vcs = VariantContextTestUtils.readEntireVCFIntoMemory(outputContaminatedHomVarDeletions.getAbsolutePath()).getRight(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VariantContextTestUtils.getVariantContexts
is all you need.
|
||
// Run both with and without --max-alternate-alleles over our interval, so that we can | ||
// prove that the argument is working as intended. | ||
final String[] argsNoMaxAlternateAlleles = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an ArgumentBuilder
would be cleaner:
final ArgumentBuilder args = new ArgumentBuilder().addInput(bam)
.addReference(hg38Reference)
.addInterval(new SimpleInterval(intervals))
.addOutput(outputContaminatedHomVarDeletions)
.addNumericArgument(IntervalArgumentCollection.INTERVAL_PADDING_LONG_NAME, 50);
runCommandLine(args);
|
||
final File outputContaminatedHomVarDeletions = createTempFile("testContaminatedHomVarDeletions", ".vcf"); | ||
|
||
// Run both with and without --max-alternate-alleles over our interval, so that we can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it run "with"?
4b4d7ef
to
36caaa9
Compare
@davidbenjamin take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
36caaa9
to
c5b0ba6
Compare
Big events or haplotypes with a lot of events can lead to very disparate likelihoods between the reference allele and the alt allele. We put a cap on how unlikely an alt read can be, but the behavior was not symmetric with respect to the reference. As a result, some sites with large indels or many SNPs could get genotyped as heterozygous with >90% alt alleles, where the one or two reference reads are likely derived from cross-sample contamination. This genotyping artifact is now fixed, though the cap is still set by the "global mismapping penalty", or the probability that a read is derived from another locus in the genome and not either of the haplotypes represented in this ActiveRegion.