Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutect2 fails when secondary alignments in same assembly region as primary #6230

Closed
bhanugandham opened this issue Oct 24, 2019 · 15 comments
Closed

Comments

@bhanugandham
Copy link
Contributor

I took my bam files that were giving this error and removed the secondary alignments (samtools view -F 256) and ran them again through Mutect2 4.1.4. I didn't get this error now. Does that mean that Mutect2 uses secondary alignments in its calling algorithm normally? BWA generates secondary alignments by default and doesn't have an option to turn them off.

This Issue was generated from your [forums]
[forums]: https://gatkforums.broadinstitute.org/gatk/discussion/comment/61300#Comment_61300

@bhanugandham
Copy link
Contributor Author

@davidbenjamin Mutect2 is failing with this error: Cannot construct fragment from more than two reads. As discussed, looks like this is an edge case we have not accounted for. If the secondary alignment is on the same assembly region as the primary, we see this issue.

@colinhercus
Copy link

Hi,
I'm sharing some files that have this problem. There's part.bam which is smallest extract from my BAM that recreates the problem and the fasta file I used.
https://www.dropbox.com/sh/euxxovf87ii3cnz/AADXwBesyXdEzWgKvGcc_Ohia?dl=0

I can't see any secondary alignments in the BAM but there is a chimeric read with two alignments, a primary and a supplementary.
You may also get supplementary alignments when alignments include alt-scaffolds.
Colin

@Rohit-Satyam
Copy link

Hi
I was facing the same issue. My alignment bam files initially contained supplementary alignments.

1346947830 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
7051016 + 0 supplementary
96636810 + 0 duplicates
1339854920 + 0 mapped (99.47% : N/A)
1339896814 + 0 paired in sequencing
669948407 + 0 read1
669948407 + 0 read2
1305680670 + 0 properly paired (97.45% : N/A)
1331545826 + 0 with itself and mate mapped
1258078 + 0 singletons (0.09% : N/A)
16885698 + 0 with mate mapped to a different chr
8931206 + 0 with mate mapped to a different chr (mapQ>=5)

However, I used the following command
sambamba view -h -f bam -F 'mapping_quality >= 1 and not (unmapped or secondary_alignment) and not ([XA] != null or [SA] != null)' .bam -o {}_mapq.bam 2> {}.stderr

And reran mutect2. It ran fine. In Chipseq it's a common practice to remove the supplementary, however, I thought the GATK variant caller took care of such alignment itself.

@danielecook
Copy link

Adding read-filter flags does not appear to fix this, as it should:

                 --read-filter NotSecondaryAlignmentReadFilter \
                 --read-filter NotSupplementaryAlignmentReadFilter \

@davidbenjamin
Copy link
Contributor

@colinhercus I was able to re-run your command successfully on the latest master branch (not in a release yet). I believe PR #6240 fixed the issue. @Rohit-Satyam @danielecook there's a good chance the errors you encountered are also fixed. If not, please let me know.

@danielecook
Copy link

danielecook commented Nov 21, 2019

@davidbenjamin I'll give it a try today next week.

@colinhercus
Copy link

colinhercus commented Nov 21, 2019 via email

@davidbenjamin
Copy link
Contributor

Between #6240 and #6327 (additional edge cases) this ought to be fixed.

@Rohit-Satyam
Copy link

@colinhercus I was able to re-run your command successfully on the latest master branch (not in a release yet). I believe PR #6240 fixed the issue. @Rohit-Satyam @danielecook there's a good chance the errors you encountered are also fixed. If not, please let me know.

In reference to your reply, I wish to inform you the problem still stands.

java.lang.IllegalArgumentException: Cannot construct fragment from more than two reads
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:725)
at org.broadinstitute.hellbender.utils.read.Fragment.create(Fragment.java:36)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1376)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.utils.genotyper.AlleleLikelihoods.groupEvidence(AlleleLikelihoods.java:595)
at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.callMutations(SomaticGenotypingEngine.java:93)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:251)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:320)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:308)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:281)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Using GATK jar /home/parashar/anaconda3/share/gatk4-4.1.4.0-0/gatk-package-4.1.4.0-local.jar

I used the following command
gatk Mutect2 -R /archive/GRCh38_GATK/Homo_sapiens_assembly38.fasta -I /scratch/pixel_dis/BCC199N_S12_dedup.bam --max-mnp-distance 0 -O /scratch/pixel_dis/BCC199N_S12_dedup_normal.vcf.gz 2> /scratch/pixel_dis/BCC199N_S12_dedup_normal.stderr

to create PONs. But ending up with the same error. I also updated GATK. However, with --independent-mates option, it works fine. Why not make that as a default option?

@davidbenjamin
Copy link
Contributor

@Rohit-Satyam The bottom of your stack trace says you are running the 4.1.4.0 release, which came out before the fixes mentioned above. I expect the issue will be resolved if you run the most recent 4.1.5.0 release. Please let me know if if does not.

@Rohit-Satyam
Copy link

Oh!!
I thought GATK on conda was maintained by GATK. My bad. Just learned from this blog. I will install it separately.

@Rohit-Satyam
Copy link

Rohit-Satyam commented Mar 6, 2020

Seems someone updated it already in conda here https://anaconda.org/bioconda/gatk4. It's working fine now. Thanks

@davidbenjamin
Copy link
Contributor

I'm glad it's working now, but a PS since you asked about -independent-mates: several months ago we made Mutect2 force paired reads to share the latent random variable indicating which haplotype they are derived from in the somatic genotyping model. This is correct because paired reads come from the same molecule of DNA. -independent-mates disables this and tells Mutect2 to forget about pairing. We only created the option because some synthetic validation data is generated by spiking in variation without regard to pairing.

@Biojenifer
Copy link

Which is the consequence of using --independent-mates? Could we get a higher rate of false positives then?

@davidbenjamin
Copy link
Contributor

It's not biased toward more false positives or false negatives; it's just worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants