-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BCFtools "--write-index=tbi" and bcftools index --tbi
generate different index files
#2267
Comments
Hello, I find the same issue. I also use GATK, GenomicsDBImport, I use bed file to split the workflow. Each bed interval one workflow. Will cause GATK GenomicsDBImport failed at this 4 regions.
but other regions success(regions at chrY 240000~710000) The fail log is same.
And I'm sure when I use the same command on bcftools 1.18, about 9000+ samples, all works well, no error. But when use bcftools 1.20, this error happens.
If I replace this tbi file with tabix command, no error happened on GATK |
We're looking at it, but note that tbi indices are compressed so you need to zcat them before hex-dumping to evaluate any differences. |
We think we found where this problem crept in (ironically fixing a related issue with multi-threading). For now, you can work around the problem by using Edit: note this chimes with your observations too with 1.18 working. The threading index fix (which broke the non-threading indices) was merged between 1.18 and 1.19. Thanks for the bug report. (Note in our opinion though the index is actually valid, but it triggers a bug in htsjdk, and older htslib's too. So the fix is definitely a good one still.) |
While I was trying to run the GATK HaplotypeCaller with option
--alleles input.vcf.gz
on a file that I had generated with BCFtools with option--write-index=tbi
, I got the error:I read that this is usually caused by an out-of-date or corrupt index file. I then regenerated the index file with option
bcftools index --tbi
and the GATK HaplotypeCaller worked without issues. Indeed the two index files generated were different with different md5sum'sI could not replicate an error with BCFtools but I did notice that
--write-index=tbi
andbcftools index --tbi
don't always create the same index files if applied to files large enough. This is reproducible with BCFtools 1.20 (using htslib 1.20):I could not understand why they are different:
I was not able to reproduce this discrepancy when creating
.csi
index files. I am not sure whether this is related to my issue with GATK HaplotypeCaller and I could not replicate an error with BCFtools when using an index generated with--write-index=tbi
but hopefully understanding the source of this discrepancy might be of useThe text was updated successfully, but these errors were encountered: