-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hard filters in M2 for mitochondria #5842
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5842 +/- ##
==============================================
+ Coverage 87.056% 87.06% +0.004%
- Complexity 32157 32178 +21
==============================================
Files 1978 1979 +1
Lines 147433 147498 +65
Branches 16206 16215 +9
==============================================
+ Hits 128350 128412 +62
- Misses 13171 13172 +1
- Partials 5912 5914 +2
|
@meganshand Is this something you are targeting for the current release? Our current plan is to release towards the end of the day today. |
@droazen while that would be amazing, I don't expect @ldgauthier or @davidbenjamin to drop everything to review this (unless one of you happens to have some free time today?). Don't let this hold back a release. |
I've got some time to try. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with review, back to @meganshand.
|
||
@Override | ||
public boolean isArtifact(final VariantContext vc, final Mutect2FilteringEngine filteringEngine) { | ||
MutableBoolean b = new MutableBoolean(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be cast as an anyMatch
:
return vc.getGenotypes().stream().filter(filteringEngine::isTumor)
.filter(g -> g.hasExtendedAttribute(GATKVCFConstants.ALLELE_FRACTION_KEY))
.anyMatch(g -> {
final double[] alleleFractions = GATKProtectedVariantContextUtils.getAttributeAsDoubleArray(g, GATKVCFConstants.ALLELE_FRACTION_KEY, () -> null, 1.0);
double max = alleleFractions[0];
for (int i=0; i<alleleFractions.length; i++) {
//Only include allele fractions of non symbolic alleles for GVCFs
if (!(vc.hasSymbolicAlleles() && i == alleleFractions.length - 1)) {
max = alleleFractions[i] > max ? alleleFractions[i] : max;
}
}
return max < minAf;
});
.filter(g -> g.hasExtendedAttribute(GATKVCFConstants.ALLELE_FRACTION_KEY)) | ||
.forEach(g -> { | ||
final double[] alleleFractions = GATKProtectedVariantContextUtils.getAttributeAsDoubleArray(g, GATKVCFConstants.ALLELE_FRACTION_KEY, () -> null, 1.0); | ||
double max = alleleFractions[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maximization can be shortened, and max
should be final:
final int numRealAlleles = vc.hasSymbolicAlleles() ? alleleFractions.length - 1 : alleleFractions.length;
final double max = IntStream.range(0, numRealAlleles).mapToDouble(a -> alleleFractions[a]).max();
|
||
import java.util.*; | ||
|
||
public class MinAfFilter extends HardFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expand name to MinAlleleFractionFilter
public class PolymorphicNuMTFilter extends HardFilter { | ||
private static final double LOWER_BOUND_PROB = .01; | ||
private static final double MULTIPLE_COPIES_MULTIPLIER = 1.5; | ||
private int maxAltDepthCutoff; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be final
?
} | ||
|
||
@Override | ||
public ErrorType errorType() { return ErrorType.ARTIFACT; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's your judgment call, but I'm torn between ErrorType.ARTIFACT
and ErrorType.SEQUENCING
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are mostly mapping artifacts (from NuMTs) I was leaning towards ARTIFACT
, but maybe let me know if you still think this should be SEQUENCING
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see your comment below, should these be NON_SOMATIC
? Really low VAF things includes sequencing errors, mapping errors, etc.
} | ||
|
||
@Override | ||
public ErrorType errorType() { return ErrorType.ARTIFACT; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ErrorType.NON_SOMATIC
because it's a mapping error of real DNA.
|
||
@Override | ||
public boolean isArtifact(final VariantContext vc, final Mutect2FilteringEngine filteringEngine) { | ||
MutableBoolean b = new MutableBoolean(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments as above above using anyMatch
on the Stream
and making the max
code more concise.
public boolean isArtifact(final VariantContext vc, final Mutect2FilteringEngine filteringEngine) { | ||
MutableBoolean b = new MutableBoolean(false); | ||
vc.getGenotypes().stream().filter(filteringEngine::isTumor) | ||
.filter(g -> g.hasAD()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a method reference Genotype::hasAD
@@ -173,6 +172,8 @@ their names (or descriptions) depend on some threshold. Those filters are not i | |||
public final static String STRICT_STRAND_BIAS_FILTER_NAME = "strict_strand"; | |||
public final static String N_RATIO_FILTER_NAME = "n_ratio"; | |||
public final static String CHIMERIC_ORIGINAL_ALIGNMENT_FILTER_NAME = "numt_chimera"; //mitochondria | |||
public final static String ALLELE_FRACTION_FILTER_NAME = "low_allele_fraction"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For beautiful formatting of FilterMutectCalls
' filter analysis output, could you makes the filter names more than 8 and less than 16 characters?
Thanks @davidbenjamin! This looks much cleaner now. Still holding out a tiny bit of hope for merging before the release, if you have a chance to look at the new changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 and good luck merging in time!
2efdf6a
to
2738424
Compare
@meganshand Latest build failed with:
|
This adds a hard filter for low variant allele fraction calls. We will not turn this on by default in any of our pipelines, but it will give users an easy option to filter everything below a certain VAF that they don't care about.
It also adds a hard filter for low alt depth calls based on a threshold from the median autosomal coverage (that must be supplied as an argument). It takes the cutoff from a Poisson with a mean of 1.5 * median coverage (to account for NuMTs with 3 copies in the autosome) and is tuned to catch 99% of the false positives (which we know will also catch lots of true positives).
It also removes the Polymorphic NUMT annotation (since that's basically what's going into the filter at this point).