Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SplitNCigarReads - Exception when read is entirely an insertion #5293

Closed
mozack opened this issue Oct 8, 2018 · 2 comments
Closed

SplitNCigarReads - Exception when read is entirely an insertion #5293

mozack opened this issue Oct 8, 2018 · 2 comments
Assignees
Labels

Comments

@mozack
Copy link

mozack commented Oct 8, 2018

Bug Report

Hi @jamesemery and @sooheelee ,

Thanks very much for looking into: #5230

I am running into one additional error. Reads that contain an insert spanning the full length of the read are causing an exception in SplitNCigarReads.

Affected tool(s) or class(es)

SplitNCigarReads

Affected version(s)

  • Tested on 4.0.3.0 and also branch: je_splitNCigarReadsSplitError (gatk-4.0.10.0-4-gb0f0ab3)

Description

SplitNCigarReads gives an Exception when a read that is entirely an insertion is encountered. By contrast, HaplotypeCaller does not seem to have a problem with these reads.

Example read:

seq.1028598	163	chr20	3146413	60	100I	=	3146307	-106	CCAATAATTCGACCCTATAAATGATGACCTCCGTTATCGGAAGGGCACAGAACCGTCAGCCGCAACACCAGCAGCTGTAGGCCCTGCTGGGCGCGCTGGG	8;72442435768::8443224764768:84:7534457962;99:787;628:7557;::7:72878:7;:7;:8754;9:::87:8799:7:7:87::	YA:Z:chr20:3145675:600M138N208I599M	MC:Z:100M	PG:Z:MarkDuplicates	RG:Z:1	NH:i:1	HI:i:1	YM:i:0	nM:i:100	YO:Z:chr20:3146278:+:56S44M	MQ:i:60	AS:i:140	YX:i:49	mc:i:3146406	ms:i:2300

Steps to reproduce

Command line:

gatk \
   --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' \
   SplitNCigarReads \
   --reference $REF \
   --input 100I_rna.bam \
   --output gatk.split.bam \
    > split.log 2>&1

A tiny BAM file illustrating the problem is attached (it is gzipped to allow Github upload).
100I_rna.bam.gz

Actual behavior

Here is the stacktrace:

***********************************************************************

A USER ERROR has occurred: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect:The stop position 3146412 is less than start 3146413 in contig chr20

***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException$MalformedGenomeLoc: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect:The stop position 3146412 is less than start 3146413 in contig chr20
        at org.broadinstitute.hellbender.utils.GenomeLocParser.vglHelper(GenomeLocParser.java:280)
        at org.broadinstitute.hellbender.utils.GenomeLocParser.validateGenomeLoc(GenomeLocParser.java:226)
        at org.broadinstitute.hellbender.utils.GenomeLocParser.createGenomeLoc(GenomeLocParser.java:185)
        at org.broadinstitute.hellbender.utils.GenomeLocParser.createGenomeLoc(GenomeLocParser.java:169)
        at org.broadinstitute.hellbender.utils.GenomeLocParser.createGenomeLoc(GenomeLocParser.java:150)
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager$SplitRead.setRead(OverhangFixingManager.java:402)
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager$SplitRead.<init>(OverhangFixingManager.java:396)
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.getSplitRead(OverhangFixingManager.java:467)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.Collections$2.tryAdvance(Collections.java:4717)
        at java.util.Collections$2.forEachRemaining(Collections.java:4725)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.addReadGroup(OverhangFixingManager.java:207)
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads.splitNCigarRead(SplitNCigarReads.java:259)
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads.firstPassApply(SplitNCigarReads.java:180)
        at org.broadinstitute.hellbender.engine.TwoPassReadWalker.lambda$traverseReads$0(TwoPassReadWalker.java:62)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at org.broadinstitute.hellbender.engine.TwoPassReadWalker.traverseReads(TwoPassReadWalker.java:60)
        at org.broadinstitute.hellbender.engine.TwoPassReadWalker.traverse(TwoPassReadWalker.java:42)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)
@jamesemery
Copy link
Collaborator

@mozack I'm not sure what aligner you are using that generates reads that don't actually align to the reference like that. I have elected to make the tool pass them through (or any reads that don't consume any reference bases eg. all softclips for example) into the output without attempting to clip or remove them. Let me know if this fixed the crash and if this behavior is desirable.

@mozack
Copy link
Author

mozack commented Oct 12, 2018

Hi @jamesemery . Thanks for the quick fix. Initial tests are looking good.

These reads were aligned with STAR and realigned with ABRA2. An insertion longer than the read length was assembled during local realignment and the 100I cigar reflects that the read is entirely contained within the insertion with the read position reflecting the base immediately preceding the insertion locus. So, this is a (potentially) accurate representation of the read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants