-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genotyping code for the Gnarly Pipeline (gnomAD v3) #4947
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4947 +/- ##
==========================================
Coverage ? 13.338%
Complexity ? 6396
==========================================
Files ? 2016
Lines ? 151745
Branches ? 16269
==========================================
Hits ? 20240
Misses ? 129234
Partials ? 2271
|
TODO: refactor duplicated VC generation code from PosteriorProbabilitiesUtilsUnitTest in ReblockGVCFUnitTest by extracting to VariantContextTestUtils |
0d143f6
to
5e3bf34
Compare
@ldgauthier Should this branch be reviewed before the GVCF reblocking PR goes in? |
They may actually be more independent than two PRs both modifying SNP/indel data usually are. I think there will be VariantContextTestUtils conflicts either way, but the VCF data will probably be the same. |
@droazen can someone take a look soon? Eric tells me we need to start the CCDG 60K callset next week and I'd rather run that off a release. |
@ldgauthier Looks like this branch is in conflict -- can you rebase before we review? |
b909de3
to
6b7d045
Compare
@droazen After a gruesome rebase, conflicts are resolved. |
@ldgauthier Since I'm about to leave for ~2 weeks, I've recruited @lbergelson to review this PR in my place. |
src/main/java/org/broadinstitute/hellbender/engine/FeatureDataSource.java
Outdated
Show resolved
Hide resolved
337c33a
to
8da4f3d
Compare
Conflicts resolved and David's comments addressed. @lbergelson can you take a look? |
@lbergelson I fixed the PL wiggle in the failing test. Think you could review before 4.1? Or maybe @droazen will look again? (Getting it in would be nice, but not critical.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ldgauthier I'm not quite done looking through this. I just want to checkpoint what I've reviewed so far.
src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBImport.java
Outdated
Show resolved
Hide resolved
new GenomicsDBOptions()); | ||
} | ||
|
||
public FeatureDataSource(final FeatureInput<T> featureInput, final int queryLookaheadBases, final Class<? extends Feature> targetFeatureType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc
src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBOptions.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBOptions.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBOptions.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/walkers/GnarlyGenotyper.java
Outdated
Show resolved
Hide resolved
src/test/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFsIntegrationTest.java
Outdated
Show resolved
Hide resolved
...roadinstitute/hellbender/tools/walkers/variantutils/PosteriorProbabilitiesUtilsUnitTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/broadinstitute/hellbender/utils/GenotypeUtilsUnitTest.java
Show resolved
Hide resolved
src/testUtils/java/org/broadinstitute/hellbender/testutils/VariantContextTestUtils.java
Outdated
Show resolved
Hide resolved
94e68ef
to
d283f3e
Compare
That should work for both my cases. It could be nice for SelectVariants to
be able to specify whether genotypes should be called or not too. Other
tools might want the sites-only option.
…On Mon, Mar 4, 2019 at 12:40 PM droazen ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBUtils.java
<#4947 (comment)>:
> @@ -40,7 +40,7 @@
*/
public static GenomicsDBExportConfiguration.ExportConfiguration createExportConfiguration(final File reference, final String workspace,
final String callsetJson, final String vidmapJson,
- final String vcfHeader) {
+ final String vcfHeader, final boolean doGnarlyGenotyping) {
@lbergelson <https://github.com/lbergelson> @ldgauthier
<https://github.com/ldgauthier> If tools had a way to inject custom GDB
config (eg., via an overridable method in GATKTool), and the engine used
this config when creating the Feature Manager on startup, would that solve
the problem here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4947 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGRhdOOjGpZBu39mqk7jekA7iOzWDTFrks5vTVqFgaJpZM4U4KK0>
.
--
Laura Doyle Gauthier, Ph.D.
Associate Director, Germline Methods
Data Sciences Platform
[email protected]
Broad Institute of MIT & Harvard
320 Charles St.
Cambridge MA 0214
|
What's the status of this one @ldgauthier ? Do you need a new reviewer with @lbergelson out? |
Yes please.
Where we left it I think Louis was happy, but we wanted to ask Nalini if
she had any suggestions to avoid threading the argument for genotypes all
the way through the engine.
…On Thu, Mar 28, 2019 at 11:49 AM droazen ***@***.***> wrote:
What's the status of this one @ldgauthier <https://github.com/ldgauthier>
? Do you need a new reviewer with @lbergelson
<https://github.com/lbergelson> out?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4947 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGRhdJBn0j6n9Dfp-sz763M3mP5b1oyDks5vbOSHgaJpZM4U4KK0>
.
--
Laura Doyle Gauthier, Ph.D.
Associate Director, Germline Methods
Data Sciences Platform
[email protected]
Broad Institute of MIT & Harvard
320 Charles St.
Cambridge MA 0214
|
src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBUtils.java
Outdated
Show resolved
Hide resolved
Not sure we can avoid threading the argument for genotypes, but would using GenomicsDBOptions instead work? |
37c7482
to
f098d6c
Compare
@ldgauthier Let me know once you've had the chance to do that |
…rge WGS cohort "Gnarly Pipeline" to finalize annotation values for filtering Super fast on Hail-derived sites-only input, needs to calculate ACs for GenomicsDB input Does NOT subset alternate alleles, just drops genotypes for sites with more than 6 alts Does NOT perform joint discovery, i.e. aggregating small amounts of evidence across samples Fix ReblockGVCF NPE on no-calls Some annotation cleanup New scheme to preserve de novo calling -- more blocks, but drop PLs, MIN_DP, floor GQs for each block Add AS combine operation and capability for GenomicsDB
789cc37
to
a5b245f
Compare
@droazen a few minor tests I missed, but this should be good to go otherwise. I did the refactor for Evoquer and I think everything else has been addressed. |
@ldgauthier Can you move |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to @ldgauthier with a final round of mostly-minor comments. We should be able to merge once they're addressed and tests pass.
@@ -205,7 +207,8 @@ private void initializeFeatureSources( final int featureQueryLookahead, final Co | |||
// Only create a data source for Feature arguments that were actually specified | |||
if ( featureInput != null ) { | |||
final Class<? extends Feature> featureType = getFeatureTypeForFeatureInputField(featureArgument.getKey()); | |||
addToFeatureSources(featureQueryLookahead, featureInput, featureType, cloudPrefetchBuffer, cloudIndexPrefetchBuffer, reference); | |||
addToFeatureSources(featureQueryLookahead, featureInput, featureType, cloudPrefetchBuffer, cloudIndexPrefetchBuffer, | |||
toolInstance instanceof VariantWalker ? ((VariantWalker) toolInstance).getGenomicsDBOptions() : null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably avoid this ugly instanceof
check by making GATKTool.getGenomicsDBOptions()
return null (or new GenomicsDBOptions(referenceArguments.getReferencePath())
) instead of throwing an exception, and then having GATKTool.initializeFeatures()
(and its overrides) pass the GenomicsDB options in to the FeatureManager
constructor, which can then propagate them down here.
Could be done in a separate PR if you don't want to do it here.
@@ -217,6 +220,13 @@ public void dumpAllFeatureCacheStats() { | |||
} | |||
} | |||
|
|||
void addToFeatureSources(final int featureQueryLookahead, final FeatureInput<? extends Feature> featureInput, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add javadoc for this overload (can copy from below)
* @return By default, not every GATK tool can read from a GenomicsDB -- child classes can override | ||
*/ | ||
protected GenomicsDBOptions getGenomicsDBOptions() { | ||
throw new IllegalArgumentException("This tool does not take a GenomicsDB as a feature input."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have this return either null
or a new GenomicsDBOptions(referenceArguments.getReferencePath())
instead of throwing an exception, you could likely get rid of the VariantWalker
special-casing in FeatureManager
.
src/main/java/org/broadinstitute/hellbender/engine/VariantWalker.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@SuppressWarnings({"unchecked", "rawtypes"}) | ||
public VariantContext finalizeGenotype(final VariantContext variant, final VariantContextBuilder annotationDBBuilder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in person, modify the method to tolerate a null annotationDBBuilder
, and add a one-arg overload of finalizeGenotype()
that takes just a VariantContext
arg.
@@ -78,7 +78,7 @@ | |||
boolean foundData = false; | |||
|
|||
for( final Genotype g : genotypes ) { | |||
if( g.isNoCall() || ! g.hasAnyAttribute(GATKVCFConstants.STRAND_BIAS_BY_SAMPLE_KEY) ) { | |||
if( ! g.hasAnyAttribute(GATKVCFConstants.STRAND_BIAS_BY_SAMPLE_KEY) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want to skip no-calls here?
...ava/org/broadinstitute/hellbender/tools/walkers/annotator/allelespecific/AS_QualByDepth.java
Show resolved
Hide resolved
*/ | ||
@Advanced | ||
@Argument(fullName=HaplotypeCallerArgumentCollection.OUTPUT_BLOCK_LOWER_BOUNDS, doc = "Output the band lower bound for each GQ block regardless of the data it represents", optional = true) | ||
public boolean floorBlocks = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test covering this new HC arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in ReblockGVCF. Do you want one in HaplotypeCaller?
src/main/java/org/broadinstitute/hellbender/tools/walkers/variantutils/ReblockGVCF.java
Show resolved
Hide resolved
matching local vs. Travis
* @param annotation the annotation to be tested | ||
* @return true if the annotation is expected to have values per-allele | ||
*/ | ||
public static boolean isAlleleSpecific(final InfoFieldAnnotation annotation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a danger that if someone adds a new allele-specific annotation one day, they will not realize that they have to update this method as well. Perhaps we could introduce an empty AlleleSpecificAnnotation
marker interface, and have just these 4 classes implement it? That would be much less likely to be missed in the future...
* Output the band lower bound for each GQ block instead of the min GQ -- for better compression | ||
*/ | ||
@Advanced | ||
@Argument(fullName=HaplotypeCallerArgumentCollection.OUTPUT_BLOCK_LOWER_BOUNDS, doc = "Output the band lower bound for each GQ block regardless of the data it represents", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to cover this new HC arg with a quick/simple direct test, since the ReblockGVCF arg is separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest version looks good overall and tests pass -- I still have a lingering maintenance concern about AnnotationUtils.isAlleleSpecific()
that could be resolved by adding an empty marker interface for AS annotations, and I'd still like to see a direct test covering that new HC arg.
After discussing with @ldgauthier, I'm going to approve this PR as-is, and Laura will address the remaining TODOs in a separate PR. For the record, the three remaining issues that need addressing are:
|
This along with the GVCF reblocking branch constitute the new code I'm using for gnomAD v3 on the Gnarly Pipeline.
Some of the GDB hacks are gross, but I can't clean it up until after the protobuf update.