Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAM: better testing of unmapped reads #1236

Open
jmthibault79 opened this issue Dec 4, 2018 · 0 comments
Open

CRAM: better testing of unmapped reads #1236

jmthibault79 opened this issue Dec 4, 2018 · 0 comments
Labels

Comments

@jmthibault79
Copy link
Contributor

Subject of the issue

PR #1233 fixes a bug in HTSJDK 2.18.0 which was observed when running GATK on an input CRAM with unmapped reads. We were able to narrow down the issue to the unmapped reads by observing that the problem still occurred by specifying -L UNMAPPED.

Add a test file to HTSJDK with these characteristics:

I have a large CRAM which demonstrates the problem but I have been unable to reduce it to its unmapped reads. This should be trivial, so we should also investigate why both of these attempts failed:

  1. Use a version of GATK (with the fix) to run PrintReads on a CRAM with -L UNMAPPED and produce CRAM output. Run GATK PrintReads on the result CRAM with -L UNMAPPED. Surprisingly, no reads are found. When re-running without -L UNMAPPED, all reads are seen. The Current Locus for these reads displays as "unmapped" in the ProgressMeter.

  2. Run samtools view -@4 -f 4 -C on a CRAM to produce CRAM output consisting of only those reads which have bit flag 0x04 set (unmapped). Run GATK PrintReads on the result CRAM. Unfortunately, this crashes with a stack trace.

java.lang.ArrayIndexOutOfBoundsException
        at sun.security.provider.DigestBase.engineUpdate(DigestBase.java:105)
        at java.security.MessageDigest$Delegate.engineUpdate(MessageDigest.java:584)
        at java.security.MessageDigest.update(MessageDigest.java:325)
        at htsjdk.samtools.util.SequenceUtil.calculateMD5(SequenceUtil.java:917)
        at htsjdk.samtools.cram.structure.Slice.setRefMD5(Slice.java:162)
        at htsjdk.samtools.CRAMContainerStreamWriter.flushContainer(CRAMContainerStreamWriter.java:455)
        at htsjdk.samtools.CRAMContainerStreamWriter.finish(CRAMContainerStreamWriter.java:128)
        at htsjdk.samtools.CRAMFileWriter.finish(CRAMFileWriter.java:129)
        at htsjdk.samtools.SAMFileWriterImpl.close(SAMFileWriterImpl.java:215)
        at org.broadinstitute.hellbender.utils.read.SAMFileGATKReadWriter.close(SAMFileGATKReadWriter.java:26)
        at org.broadinstitute.hellbender.tools.PrintReads.closeTool(PrintReads.java:107)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:970)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant