-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency in samtools stats output - cram index issue? #1639
Comments
Thank you for the very clear bug report. I can reproduce it. That's disappointing as I thought we'd recently bug fixed that function! I wonder if the bug fix itself broke something else. Yes, this will most definitely point us in the right direction. |
FWIW the bug I fixed before wasn't this, but a related issue elsewhere. The cause and fix are both trivial. Many thanks for the clear bug report and data to reproduce it. |
Thanks for the quick fix (and shoutout on twitter!), I tested the fix on the large dataset where I first noticed the problem and can confirm I get the expected result now. |
The index is a loaded into a nested containment list, so the last entry in the index array is not necessarily the last slice, as the last slice may be entirely contained within a previous one. Fixes samtools#1639
Hi,
I recently ran into a strange problem where the output of samtools stats differed between two invocations that I expected to be equivalent. I was trying to generate the stats for a single chromosome and initially ran:
The results reported far fewer reads than expected, so I re-ran the command providing the full range for this chromosome and got the expected number of reads:
By definition these two range specifiers should be the same so I spent a bit of time looking into this and I think I tracked it down to a difference in what
cram_index_last
andcram_index_last_query
return. I've attached a minimal example (sim1.cram) to reproduce the problem. This is simply 500 full-length copies of a mtDNA sequence, with 1% errors introduced.With this .cram file
samtools stats sim1.cram chrM
reports 257 total reads for this chromosome butsamtools stats sim1.cram chrM:1-19154
reports 486.Here's a snippet of code to dump the
cram_index*
returned by the two methods:On sim1.cram this code prints:
which correspond to the first and second (last) entry in the .crai file for this chromosome (tid 5):
I don't know the cram format well enough to debug this any further but I hope this points in the right direction.
Jared
sim1.zip
The text was updated successfully, but these errors were encountered: