-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified HaplotypeCaller's force-calling mode with Mutect2's #6090
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced the test coverage is great, but if you can explain what's going on in the expected test VCF, everything else looks pretty good.
public FeatureInput<VariantContext> alleles; | ||
|
||
@Advanced | ||
@Argument(fullName = "genotype-filtered-alleles", doc = "Whether to force genotype even filtered alleles", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "Force genotyping of filtered alleles included in the resource specified by --alleles"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if ( assemblyArgs.ggaExtension < 0) { | ||
throw new CommandLineException.BadArgumentValue("maxGGAAREExtension", "" + assemblyArgs.ggaExtension + "< 0"); | ||
if ( assemblyArgs.extension < 0) { | ||
throw new CommandLineException.BadArgumentValue("maxDiscARExtension", "" + assemblyArgs.extension + "< 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff for this file is so beautiful. <3 I'm glad you took out all the GGA special casing.
@@ -1820,5 +1822,6 @@ public static boolean isUnmixedMnpIgnoringNonRef(final VariantContext vc) { | |||
|
|||
return true; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious why there are two new imports, but no new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was diplomatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
.collect(Collectors.toMap(vc -> vc.getStart(), vc -> vc.getAlternateAlleles())); | ||
for (final VariantContext vc : new FeatureDataSource<VariantContext>(forceCallingVcf)) { | ||
final List<Allele> altAllelesAtThisLocus = altAllelesByPosition.get(vc.getStart()); | ||
vc.getAlternateAlleles().forEach(a -> Assert.assertTrue(altAllelesAtThisLocus.contains(a))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this improved test.
@@ -32,7 +32,6 @@ | |||
20 10004094 . A T . . . GT 0|1 | |||
20 10004769 . TAAAACTATGC T . . . GT 0|1 | |||
20 10004771 . A T . . . GT 0|1 | |||
20 10006819 . AAAAC T . . . GT 0|1 | |||
20 10006819 . AAAAC A . . . GT 0|1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these are fixes with respect to the proper reference base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. The old GGA mode didn't care if the -alleles
vcf was invalid.
// then we may need to trim the alleles (because the original VariantContext may have had to pad at the end). | ||
if ( outputAlleles.size() != vc.getAlleles().size() && !limitedContext ) // limitedContext callers need to handle allele trimming on their own to keep their alleles in sync | ||
{ | ||
vcCall = GATKVariantContextUtils.reverseTrimAlleles(vcCall); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did the trimming go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had above:
final boolean limitedContext = features == null || refContext == null || rawContext == null || stratifiedContexts == null;
In every invocation of this method (which I think I cleaned up in this PR) all of these things were null
and so limtiedContext
was inevitable true
. Hence the && !limitedContext
prevented this reverse trimming code from ever being reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And don't worry, M2 and HC still properly trim alleles. In fact, M2 already has an integration test to make sure we output a parsimonious allele representation even after indels that miss the tumor lod threshold are invisibly dropped.
@@ -389,7 +338,9 @@ private OutputAlleleSubset calculateOutputAlleleSubset(final AFCalculationResult | |||
|
|||
//it's possible that the upstream deletion that spanned this site was not emitted, mooting the symbolic spanning deletion allele | |||
final boolean isSpuriousSpanningDeletion = GATKVCFConstants.isSpanningDeletion(allele) && !isVcCoveredByDeletion(vc); | |||
final boolean toOutput = (isPlausible || forceKeepAllele(allele) || isNonRefWhichIsLoneAltAllele) && !isSpuriousSpanningDeletion; | |||
|
|||
//TODO: force-clling logic goes here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, that had already been addressed. Fixed.
} | ||
// if the site was down-sampled, record that fact | ||
if ( !limitedContext && rawContext.hasPileupBeenDownsampled() ) { | ||
attributes.put(GATKVCFConstants.DOWNSAMPLED_KEY, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never ever seen data that has this key. I'm glad you're removing any mention of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More unreachable code due to limitedContext
always being true
.
24e7f36
to
b48b007
Compare
Addressed comments and beefed-up test coverage. Back to @ldgauthier. |
b48b007
to
f7beca4
Compare
@ldgauthier Fresh rebase post-deletion of old qual. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Infinitesimally minor comment about variable names, otherwise good to go!
@@ -125,4 +130,11 @@ public ReadThreadingAssembler createReadThreadingAssembler() { | |||
@Argument(fullName = MAX_MNP_DISTANCE_LONG_NAME, shortName = MAX_MNP_DISTANCE_SHORT_NAME, | |||
doc = "Two or more phased substitutions separated by this distance or less are merged into MNPs.", optional = true) | |||
public int maxMnpDistance = getDefaultMaxMnpDistance(); | |||
|
|||
@Argument(fullName= FORCE_CALL_ALLELES_LONG_NAME, doc="The set of alleles for which to force genotyping regardless of evidence", optional=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bunch of places where the variables or test names or whatever say force call, but the arg doc here says force genotyping. Very minor nit to pick, but it would be good to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@ldgauthier as we discussed.