Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix edge case for Mutect2 germline resource #5563

Merged
merged 1 commit into from
Jan 7, 2019
Merged

Conversation

davidbenjamin
Copy link
Contributor

Closes #5553.

@takutosato This is a bug fix for the case --default-af 0.0.

@codecov-io
Copy link

Codecov Report

Merging #5563 into master will decrease coverage by 0.002%.
The diff coverage is 0%.

@@               Coverage Diff               @@
##              master     #5563       +/-   ##
===============================================
- Coverage     87.081%   87.079%   -0.002%     
  Complexity     31478     31478               
===============================================
  Files           1929      1929               
  Lines         145041    145043        +2     
  Branches       16074     16075        +1     
===============================================
- Hits          126303    126302        -1     
+ Misses         12891     12890        -1     
- Partials        5847      5851        +4
Impacted Files Coverage Δ Complexity Δ
.../walkers/mutect/GermlineProbabilityCalculator.java 82.857% <0%> (-11.082%) 13 <1> (-1)
...nder/utils/runtime/StreamingProcessController.java 67.299% <0%> (-0.474%) 33% <0%> (ø)
...lotypecaller/readthreading/ReadThreadingGraph.java 88.608% <0%> (-0.253%) 144% <0%> (-1%)
...ithwaterman/SmithWatermanIntelAlignerUnitTest.java 60% <0%> (ø) 2% <0%> (ø) ⬇️
...utils/smithwaterman/SmithWatermanIntelAligner.java 80% <0%> (+30%) 3% <0%> (+2%) ⬆️

@davidbenjamin davidbenjamin merged commit 0570670 into master Jan 7, 2019
@davidbenjamin davidbenjamin deleted the db_edge_case branch January 7, 2019 17:08
@MikeWLloyd
Copy link

Awesome, glad this is patched up so fast. Will this get picked up by the nightly docker build? I can test tomorrow if so.

@byoo
Copy link
Contributor

byoo commented Jan 7, 2019

Thank you! I also wonder if you could advise timeline for the new release with this fix.

@ldgauthier
Copy link
Contributor

The next release is going to be a big 4.1 minor version release, so there are a lot of new features we're trying to finish up. Hopefully it will be by the end of the month, but a nightly will be available with this change by tomorrow: https://hub.docker.com/r/broadinstitute/gatk-nightly

@pjongeneel
Copy link

pjongeneel commented Jan 7, 2019

Just wanted to mention, I'm able to get this error when I use --af-of-alleles-not-in-resource 0.00003125. Noticed the edge case just mentioned 0, but if this hasn't already been dealt with in this fix, I can provide more information.

Edit: I've noticed that this happens when I supply -I tumor.bam -I normal.bam -tumor SAMPLE1 and I don't specify a -normal SAMPLE2. If I do include the -normal SAMPLE2, it works fine without error. Not sure if intended or not.

@davidbenjamin
Copy link
Contributor Author

@pjongeneel Thank you for mentioning this. This PR will also fix the issue when the germline resource has a population allele frequency of 0 or 1 regardless of the default. I believe the problem is coming from using gnomAD vcfs lifted-over to hg38. We have only been using the hg37 gnomAD in development, but it seems that we were naive about the pitfalls of the lift-over.

@MikeWLloyd
Copy link

@ldgauthier the nightly docker build you pointed to seems behind 4.0.12.0. The Mutect2 error fixed in #5442 is present in the nightly docker build, so I can't test this fix.

@byoo
Copy link
Contributor

byoo commented Jan 8, 2019

@davidbenjamin Hmm, it doesn't seem to resolve the issue for me. I tested as below and same exception was thrown. Could you confirm if I tested correctly?

java -jar gatk-package-4.0.12.0-18-g0570670-SNAPSHOT-local.jar FilterMutectCalls -V test-unfiltered.vcf -O test-filtered.vcf.gz

$ cat test-unfiltered.vcf

##fileformat=VCFv4.2
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=P_PRIOR_RO,Number=1,Type=Float,Description="prior probability of read orientation-based artifacts under the present referene context">
##FORMAT=<ID=P_RO,Number=1,Type=Float,Description="posterior probability of read orientation-based artifacts">
##FORMAT=<ID=ROF_TYPE,Number=1,Type=String,Description="type of read orientation artifact (F1R2 or F2R1)">
##FORMAT=<ID=SAAF,Number=3,Type=Float,Description="MAP estimates of allele fraction given z">
##FORMAT=<ID=SAPP,Number=3,Type=Float,Description="posterior probabilities of the presence of strand artifact">
##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
##INFO=<ID=GERMQ,Number=A,Type=Integer,Description="Phred-scaled qualities that alt allele are not germline variants">
##INFO=<ID=MBQ,Number=R,Type=Integer,Description="median base quality">
##INFO=<ID=MFRL,Number=R,Type=Integer,Description="median fragment length">
##INFO=<ID=MMQ,Number=R,Type=Integer,Description="median mapping quality">
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="median distance from end of read">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="log odds of artifact in normal with same allele fraction as tumor">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="Normal LOD score">
##INFO=<ID=PON,Number=0,Type=Flag,Description="site found in panel of normals">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative-log-10 population allele frequencies of alt alleles">
##INFO=<ID=REF_BASES,Number=1,Type=String,Description="local reference bases.">
##INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log odds ratio score for variant">
##MutectVersion=2.1
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
##contig=<ID=MT,length=16569>
##contig=<ID=GL000191.1,length=106433>
##contig=<ID=GL000192.1,length=547496>
##contig=<ID=GL000193.1,length=189789>
##contig=<ID=GL000194.1,length=191469>
##contig=<ID=GL000195.1,length=182896>
##contig=<ID=GL000196.1,length=38914>
##contig=<ID=GL000197.1,length=37175>
##contig=<ID=GL000198.1,length=90085>
##contig=<ID=GL000199.1,length=169874>
##contig=<ID=GL000200.1,length=187035>
##contig=<ID=GL000201.1,length=36148>
##contig=<ID=GL000202.1,length=40103>
##contig=<ID=GL000203.1,length=37498>
##contig=<ID=GL000204.1,length=81310>
##contig=<ID=GL000205.1,length=174588>
##contig=<ID=GL000206.1,length=41001>
##contig=<ID=GL000207.1,length=4262>
##contig=<ID=GL000208.1,length=92689>
##contig=<ID=GL000209.1,length=159169>
##contig=<ID=GL000210.1,length=27682>
##contig=<ID=GL000211.1,length=166566>
##contig=<ID=GL000212.1,length=186858>
##contig=<ID=GL000213.1,length=164239>
##contig=<ID=GL000214.1,length=137718>
##contig=<ID=GL000215.1,length=172545>
##contig=<ID=GL000216.1,length=172294>
##contig=<ID=GL000217.1,length=172149>
##contig=<ID=GL000218.1,length=161147>
##contig=<ID=GL000219.1,length=179198>
##contig=<ID=GL000220.1,length=161802>
##contig=<ID=GL000221.1,length=155397>
##contig=<ID=GL000222.1,length=186861>
##contig=<ID=GL000223.1,length=180455>
##contig=<ID=GL000224.1,length=179693>
##contig=<ID=GL000225.1,length=211173>
##contig=<ID=GL000226.1,length=15008>
##contig=<ID=GL000227.1,length=128374>
##contig=<ID=GL000228.1,length=129120>
##contig=<ID=GL000229.1,length=19913>
##contig=<ID=GL000230.1,length=43691>
##contig=<ID=GL000231.1,length=27386>
##contig=<ID=GL000232.1,length=40652>
##contig=<ID=GL000233.1,length=45941>
##contig=<ID=GL000234.1,length=40531>
##contig=<ID=GL000235.1,length=34474>
##contig=<ID=GL000236.1,length=41934>
##contig=<ID=GL000237.1,length=45867>
##contig=<ID=GL000238.1,length=39939>
##contig=<ID=GL000239.1,length=33824>
##contig=<ID=GL000240.1,length=41933>
##contig=<ID=GL000241.1,length=42152>
##contig=<ID=GL000242.1,length=43523>
##contig=<ID=GL000243.1,length=43341>
##contig=<ID=GL000244.1,length=39929>
##contig=<ID=GL000245.1,length=36651>
##contig=<ID=GL000246.1,length=38154>
##contig=<ID=GL000247.1,length=36422>
##contig=<ID=GL000248.1,length=39786>
##contig=<ID=GL000249.1,length=38502>
##contig=<ID=NC_007605,length=171823>
##filtering_status=Warning: unfiltered Mutect 2 calls.  Please run FilterMutectCalls to remove false positives.
##source=Mutect2
##tumor_sample=Ameloblastoma_FFPE_P5
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Ameloblastoma_FFPE_P5
22	20067929	.	G	GC	.	.	DP=6;ECNT=1;MBQ=0,37;MFRL=0,141;MMQ=0,60;MPOS=15;POPAF=0.00;REF_BASES=CGCGAGGGGCGCCCGCGGGCG;RPA=3,4;RU=C;STR;TLOD=16.00	GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP	0/1:0,5:0.863:5:0,0:0,5:0.990,0.990,1.00:0.026,0.028,0.946

@davidbenjamin
Copy link
Contributor Author

@byoo It turns out that the first fix assumed a numerical precision (for comparing the population AF with 1) that isn't guaranteed on every JVM. I just merged another PR (#5560) that checks for populationAlleleFrequency > 1 - 1.0e-10 instead of 1 - 1.0e-20, which ought to work on every JVM.

If it doesn't work for you, please let me know again and I will do my best to fix it, this time with a very red face!!

@ldgauthier
Copy link
Contributor

@MikeWLloyd I realized after I posted that link that the nightly Docker build has been broken for some time. We're going to dig into it.

@byoo
Copy link
Contributor

byoo commented Jan 9, 2019

@davidbenjamin Thank you for your kind explanations! It motivated me to read JVM spec.
BTW, it looks that at least another edge case exists, which is not fixed by the PRs.
I tested with gatk-4.0.12.0-20-gf9a2e5c-SNAPSHOT

Variant (also wonder how 0 values for DP and AD can be interpreted) :

19      53302899        .       G       C       .       .       DP=72;ECNT=2;MBQ=0,0;MFRL=0,0;MMQ=0,0;MPOS=0;POPAF=7.30;REF_BASES=GGTATACAAGGTTTGACATCT;SAAF=0.00,0.00,NaN;SAPP=0.025,0.025,0.950;TLOD=6.82     GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:P_PRIOR_RO:P_RO:ROF_TYPE       0|1:0,0:0.962:0:0,0:0,0:0|1:53302899_G_C:53302899:1.444e-05:2.931e-03:F2R1

Stacktrace:

java.lang.IllegalArgumentException: errorRateLog10 must be good probability but got NaN
        at org.broadinstitute.hellbender.utils.QualityUtils.phredScaleLog10ErrorRate(QualityUtils.java:321)
        at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2FilteringEngine.lambda$applyGermlineVariantFilter$16(Mutect2FilteringEngine.java:286)
        at java.util.stream.DoublePipeline$3$1.accept(DoublePipeline.java:231)
        at java.util.Spliterators$DoubleArraySpliterator.forEachRemaining(Spliterators.java:1198)
        at java.util.Spliterator$OfDouble.forEachRemaining(Spliterator.java:822)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
        at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
        at java.util.stream.IntPipeline.toArray(IntPipeline.java:502)
        at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2FilteringEngine.applyGermlineVariantFilter(Mutect2FilteringEngine.java:286)
        at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2FilteringEngine.calculateFilters(Mutect2FilteringEngine.java:519)
        at org.broadinstitute.hellbender.tools.walkers.mutect.FilterMutectCalls.firstPassApply(FilterMutectCalls.java:130)
        at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.lambda$traverseVariants$0(TwoPassVariantWalker.java:76)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.traverseVariants(TwoPassVariantWalker.java:74)
        at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.traverse(TwoPassVariantWalker.java:27)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)``

@davidbenjamin
Copy link
Contributor Author

@byoo Thanks, we will fix that one ASAP. By the way, we are doing a big refactoring of Mutect2 filtering, which won't change the outputs but will improve the internal software engineering so that we can write better unit tests to catch these sorts of things.

@MikeWLloyd
Copy link

@ldgauthier thanks for looking into that as well. I see there is an open issue #4965 that has been revived. I will follow that moving forward.

@davidbenjamin
Copy link
Contributor Author

PR #5578 should resolve this.

@byoo
Copy link
Contributor

byoo commented Jan 15, 2019

Thank you for the quick help, @davidbenjamin! By the way, is the phased zero-depth variant called due to #5434? Or, is it an intended call?

@davidbenjamin
Copy link
Contributor Author

@byoo Given that the TLOD is 16, Mutect2's somatic genotyping is doing what it is supposed to do. However, it makes me wonder if something strange is happening in the assembly.

If you want to post a few IGV screenshots of the bamout for this call, sorted by base and including both variant and non-variant tumor reads, as well as the assembled haplotypes, I will try to give an educated guess.

@byoo
Copy link
Contributor

byoo commented Jan 15, 2019

@davidbenjamin, thank you. I'd like to better interpret those. I am not sure if this is enough. If helpful, I will share the bam file.

zerodepth_vars

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FilterMutectCalls (gatk 4.0.12.0) fails due to the probability value of NaN
7 participants