-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix several issues with M2 and HC force-calling mode #5874
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor things and one unit test request. Feel free to merge after you add the test.
for (final Allele givenAllele : unassembledGivenAlleles) { | ||
for (final Haplotype baseHaplotype : baseHaplotypes) { | ||
// make sure this allele doesn't collide with a variant on the haplotype | ||
if (baseHaplotype.getEventMap()!= null && baseHaplotype.getEventMap().getVariantContexts().stream().anyMatch(vc -> vc.overlaps(givenVC))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the GGA allele is a SNP that is spanned by a deletion in the discovered variants the it's only added to the reference haplotype, right? And it will still get output in the vcf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and yes.
EventMap.buildEventMapsForHaplotypes(haplotypeList, fullReferenceWithPadding, paddedReferenceLoc, debug, maxMnpDistance); | ||
variationEvents = EventMap.getAllVariantContexts(haplotypeList); | ||
|
||
final boolean sameMnpDistance = lastMaxMnpDistanceUsed.isPresent() && maxMnpDistance == lastMaxMnpDistanceUsed.getAsInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the MNP distance change within the same tool execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't ever change, but I lean toward keeping the logic self-consistent without assuming anything about the tools that invoke it. Of course, some of these classes are so entwined with HC and M2 that it's safe to assume things, but in this case the price of caution is small.
Also, the code for exactly when event maps are cached was quite brittle in that you could write reasonable code and encounter gotchas, so I felt like making it all more explicit as to what had and had not been computed previously.
@@ -542,9 +542,6 @@ public ActivityProfileState isActive( final AlignmentContext context, final Refe | |||
final AssemblyResultSet untrimmedAssemblyResult = AssemblyBasedCallerUtils.assembleReads(region, givenAlleles, hcArgs, readsHeader, samplesList, logger, referenceReader, assemblyEngine, aligner, !hcArgs.doNotCorrectOverlappingBaseQualities); | |||
|
|||
final SortedSet<VariantContext> allVariationEvents = untrimmedAssemblyResult.getVariationEvents(hcArgs.maxMnpDistance); | |||
// TODO - line bellow might be unnecessary : it might be that assemblyResult will always have those alleles anyway | |||
// TODO - so check and remove if that is the case: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels so good to finally take care of that TODO that's been around for longer than I have.
* @param readErrorCorrector a ReadErrorCorrector object, if read are to be corrected before assembly. Can be null if no error corrector is to be used. | ||
* @param aligner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to modify the java doc anyway, can you add a blurb for the aligner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* @param reads the reads we're going to assemble | ||
* @param refHaplotype the reference haplotype | ||
* @param aligner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java doc update please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* @param allowLowComplexityGraphs if true, do not check for low-complexity graphs | ||
* @param allowNonUniqueKmersInRef if true, do not fail if the reference has non-unique kmers | ||
* @param aligner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -61,7 +61,7 @@ public EventMap(final Collection<VariantContext> stateForTesting) { | |||
/** | |||
* | |||
* @param maxMnpDistance Phased substitutions separated by this distance or less are merged into MNPs. More than | |||
* two substitutions occuring in the same alignment block (ie the same M/X/EQ CIGAR element) | |||
* two substitutions occurring in the same alignment block (ie the same M/X/EQ CIGAR element) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boy Scout Rule
@@ -464,6 +464,33 @@ public void testGivenAllelesMode() throws Exception { | |||
} | |||
} | |||
|
|||
/** | |||
* Here we give Mutect2 ridiculous kmer settings in order to force assembly to fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was where the previous behavior was an infinite loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous behavior was: assembly fails, therefore GGA alleles are injected in vain and don't get genotyped. Assembly still fails now, but we then inject the GGA alleles into the reference without using a graph.
Assert.assertEquals(assemblyResultSet.getHaplotypeCount(), 4); | ||
Assert.assertEquals(assemblyResultSet.getHaplotypeList().get(2).getBaseString(), "AAAAGCCCGGGGTTTT"); | ||
Assert.assertEquals(assemblyResultSet.getHaplotypeList().get(3).getBaseString(), "ACAAGCCCGGGGTTTT"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like your code should handle multi-allelic input alleles, but can you add a test please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Codecov Report
@@ Coverage Diff @@
## master #5874 +/- ##
===============================================
+ Coverage 80.106% 86.841% +6.735%
- Complexity 30643 32319 +1676
===============================================
Files 1991 1991
Lines 149202 149308 +106
Branches 16480 16486 +6
===============================================
+ Hits 119520 129661 +10141
+ Misses 23875 13633 -10242
- Partials 5807 6014 +207
|
Closes #5857.
@ldgauthier Here it is, with the 5-haplotype compromise