Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard filters in M2 for mitochondria #5842

Merged
merged 2 commits into from
Mar 28, 2019
Merged

Hard filters in M2 for mitochondria #5842

merged 2 commits into from
Mar 28, 2019

Conversation

meganshand
Copy link
Contributor

This adds a hard filter for low variant allele fraction calls. We will not turn this on by default in any of our pipelines, but it will give users an easy option to filter everything below a certain VAF that they don't care about.

It also adds a hard filter for low alt depth calls based on a threshold from the median autosomal coverage (that must be supplied as an argument). It takes the cutoff from a Poisson with a mean of 1.5 * median coverage (to account for NuMTs with 3 copies in the autosome) and is tuned to catch 99% of the false positives (which we know will also catch lots of true positives).

It also removes the Polymorphic NUMT annotation (since that's basically what's going into the filter at this point).

@codecov-io
Copy link

codecov-io commented Mar 28, 2019

Codecov Report

Merging #5842 into master will increase coverage by 0.004%.
The diff coverage is 94.444%.

@@              Coverage Diff               @@
##              master    #5842       +/-   ##
==============================================
+ Coverage     87.056%   87.06%   +0.004%     
- Complexity     32157    32178       +21     
==============================================
  Files           1978     1979        +1     
  Lines         147433   147498       +65     
  Branches       16206    16215        +9     
==============================================
+ Hits          128350   128412       +62     
- Misses         13171    13172        +1     
- Partials        5912     5914        +2
Impacted Files Coverage Δ Complexity Δ
...itute/hellbender/tools/walkers/mutect/Mutect2.java 80.952% <ø> (-0.866%) 22 <0> (-1)
...der/tools/walkers/mutect/M2ArgumentCollection.java 90% <ø> (ø) 14 <0> (ø) ⬇️
...ute/hellbender/utils/variant/GATKVCFConstants.java 75% <ø> (ø) 4 <0> (ø) ⬇️
...r/tools/walkers/mutect/Mutect2IntegrationTest.java 87.776% <ø> (-0.021%) 91 <0> (ø)
...lkers/mutect/filtering/Mutect2FilteringEngine.java 97.115% <100%> (+0.057%) 43 <0> (ø) ⬇️
.../mutect/filtering/M2FiltersArgumentCollection.java 93.75% <100%> (+0.417%) 6 <0> (ø) ⬇️
...kers/mutect/filtering/MinAlleleFractionFilter.java 100% <100%> (ø) 7 <7> (?)
...e/hellbender/utils/variant/GATKVCFHeaderLines.java 94.886% <100%> (+0.029%) 11 <0> (ø) ⬇️
...alkers/mutect/filtering/PolymorphicNuMTFilter.java 88.235% <88.235%> (ø) 9 <9> (?)
...lbender/utils/variant/GATKVariantContextUtils.java 87.309% <0%> (-0.306%) 244% <0%> (-2%)
... and 10 more

@droazen
Copy link
Contributor

droazen commented Mar 28, 2019

@meganshand Is this something you are targeting for the current release? Our current plan is to release towards the end of the day today.

@meganshand
Copy link
Contributor Author

@droazen while that would be amazing, I don't expect @ldgauthier or @davidbenjamin to drop everything to review this (unless one of you happens to have some free time today?). Don't let this hold back a release.

@davidbenjamin
Copy link
Contributor

I've got some time to try.

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with review, back to @meganshand.


@Override
public boolean isArtifact(final VariantContext vc, final Mutect2FilteringEngine filteringEngine) {
MutableBoolean b = new MutableBoolean(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be cast as an anyMatch:

return vc.getGenotypes().stream().filter(filteringEngine::isTumor)
                .filter(g -> g.hasExtendedAttribute(GATKVCFConstants.ALLELE_FRACTION_KEY))
                .anyMatch(g -> {
                    final double[] alleleFractions = GATKProtectedVariantContextUtils.getAttributeAsDoubleArray(g, GATKVCFConstants.ALLELE_FRACTION_KEY, () -> null, 1.0);
                    double max = alleleFractions[0];
                    for (int i=0; i<alleleFractions.length; i++) {
                        //Only include allele fractions of non symbolic alleles for GVCFs
                        if (!(vc.hasSymbolicAlleles() && i == alleleFractions.length - 1)) {
                            max = alleleFractions[i] > max ? alleleFractions[i] : max;
                        }
                    }
                    return max < minAf;
                });

.filter(g -> g.hasExtendedAttribute(GATKVCFConstants.ALLELE_FRACTION_KEY))
.forEach(g -> {
final double[] alleleFractions = GATKProtectedVariantContextUtils.getAttributeAsDoubleArray(g, GATKVCFConstants.ALLELE_FRACTION_KEY, () -> null, 1.0);
double max = alleleFractions[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximization can be shortened, and max should be final:

final int numRealAlleles = vc.hasSymbolicAlleles() ? alleleFractions.length - 1 : alleleFractions.length;
final double max = IntStream.range(0, numRealAlleles).mapToDouble(a -> alleleFractions[a]).max();


import java.util.*;

public class MinAfFilter extends HardFilter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand name to MinAlleleFractionFilter

public class PolymorphicNuMTFilter extends HardFilter {
private static final double LOWER_BOUND_PROB = .01;
private static final double MULTIPLE_COPIES_MULTIPLIER = 1.5;
private int maxAltDepthCutoff;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be final?

}

@Override
public ErrorType errorType() { return ErrorType.ARTIFACT; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's your judgment call, but I'm torn between ErrorType.ARTIFACT and ErrorType.SEQUENCING.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are mostly mapping artifacts (from NuMTs) I was leaning towards ARTIFACT, but maybe let me know if you still think this should be SEQUENCING.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see your comment below, should these be NON_SOMATIC? Really low VAF things includes sequencing errors, mapping errors, etc.

}

@Override
public ErrorType errorType() { return ErrorType.ARTIFACT; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ErrorType.NON_SOMATIC because it's a mapping error of real DNA.


@Override
public boolean isArtifact(final VariantContext vc, final Mutect2FilteringEngine filteringEngine) {
MutableBoolean b = new MutableBoolean(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments as above above using anyMatch on the Stream and making the max code more concise.

public boolean isArtifact(final VariantContext vc, final Mutect2FilteringEngine filteringEngine) {
MutableBoolean b = new MutableBoolean(false);
vc.getGenotypes().stream().filter(filteringEngine::isTumor)
.filter(g -> g.hasAD())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a method reference Genotype::hasAD

@@ -173,6 +172,8 @@ their names (or descriptions) depend on some threshold. Those filters are not i
public final static String STRICT_STRAND_BIAS_FILTER_NAME = "strict_strand";
public final static String N_RATIO_FILTER_NAME = "n_ratio";
public final static String CHIMERIC_ORIGINAL_ALIGNMENT_FILTER_NAME = "numt_chimera"; //mitochondria
public final static String ALLELE_FRACTION_FILTER_NAME = "low_allele_fraction";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For beautiful formatting of FilterMutectCalls' filter analysis output, could you makes the filter names more than 8 and less than 16 characters?

@meganshand
Copy link
Contributor Author

Thanks @davidbenjamin! This looks much cleaner now. Still holding out a tiny bit of hope for merging before the release, if you have a chance to look at the new changes.

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 and good luck merging in time!

@droazen
Copy link
Contributor

droazen commented Mar 28, 2019

@meganshand Latest build failed with:

org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorIntegrationTest.testWithAllAnnotations FAILED
    java.lang.AssertionError: Iterators differ at element [9]: FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification"> != FORMAT=<ID=NUMT,Number=1,Type=String,Description="Potentially a polymorphic NuMT false positive rather than a real mitochondrial variant."> expected [FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">] but found [FORMAT=<ID=NUMT,Number=1,Type=String,Description="Potentially a polymorphic NuMT false positive rather than a real mitochondrial variant.">]
        at org.testng.Assert.fail(Assert.java:93)
        at org.testng.Assert.failNotEquals(Assert.java:512)
        at org.testng.Assert.assertEqualsImpl(Assert.java:134)
        at org.testng.Assert.assertEquals(Assert.java:610)
        at org.testng.Assert.assertEquals(Assert.java:578)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorIntegrationTest.assertHeadersMatch(VariantAnnotatorIntegrationTest.java:66)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorIntegrationTest.runVariantAnnotatorAndAssertSomething(VariantAnnotatorIntegrationTest.java:92)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorIntegrationTest.assertVariantContextsMatch(VariantAnnotatorIntegrationTest.java:55)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorIntegrationTest.assertVariantContextsMatch(VariantAnnotatorIntegrationTest.java:49)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorIntegrationTest.testWithAllAnnotations(VariantAnnotatorIntegrationTest.java:224)
Exception in thread "Thread-5" java.nio.file.ClosedFileSystemException
	at com.google.common.jimfs.FileSystemState.checkOpen(FileSystemState.java:64)
	at com.google.common.jimfs.JimfsFileStore.lookUp(JimfsFileStore.java:141)
	at com.google.common.jimfs.FileSystemView.lookUp(FileSystemView.java:123)
	at com.google.common.jimfs.FileSystemView.lookUpWithLock(FileSystemView.java:112)
	at com.google.common.jimfs.FileSystemView.readAttributes(FileSystemView.java:771)
	at com.google.common.jimfs.JimfsFileSystemProvider.readAttributes(JimfsFileSystemProvider.java:346)
	at java.nio.file.Files.readAttributes(Files.java:1737)
	at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
	at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
	at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:322)
	at java.nio.file.Files.walkFileTree(Files.java:2662)
	at java.nio.file.Files.walkFileTree(Files.java:2742)
	at htsjdk.samtools.util.IOUtil.recursiveDelete(IOUtil.java:1344)
	at org.broadinstitute.hellbender.utils.io.IOUtils.deleteRecursively(IOUtils.java:1061)
	at org.broadinstitute.hellbender.utils.io.DeleteRecursivelyOnExitPathHook.runHooks(DeleteRecursivelyOnExitPathHook.java:56)
	at java.lang.Thread.run(Thread.java:748)
Results: FAILURE (1604 tests, 1603 successes, 1 failures, 0 skipped)

@meganshand meganshand merged commit ea3032d into master Mar 28, 2019
@meganshand meganshand deleted the ms_new_filters branch March 28, 2019 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants