Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry Regarding IOError Encountered During Multiple Rounds of Merging gVCF Files Using GLnexus #313

Open
zhanxiangzong opened this issue Jul 4, 2024 · 1 comment

Comments

@zhanxiangzong
Copy link

Hi!

Thank you for providing this tool.

I encountered a minor issue while using GLnexus and would like to seek your guidance.

Initially, I used GLnexus to merge five datasets, each consisting of several hundred gVCFs, and this process was completed without any problems. However, after converting the five merged BCFs back to gVCFs and attempting to merge these five files again using GLnexus, I encountered a failure during the "discovering alleles" step.

Here are the details of the error message:

[GLnexus] [info] glnexus_cli release v1.4.1-0-g68e25e5 Aug 13 2021
[GLnexus] [info] detected jemalloc
[GLnexus] [info] Loading config preset gatk
[GLnexus] [info] config:
...
[GLnexus] [info] config CRC32C = 1926883223
[GLnexus] [info] init database, exemplar_vcf=./all_gvcf/500.gvcf.gz
[GLnexus] [info] Initialized GLnexus database in GLnexus.DB
[GLnexus] [info] bucket size: 30000
[GLnexus] [info] contigs:
...
[GLnexus] [info] db_get_contigs GLnexus.DB
[GLnexus] [info] Beginning bulk load with no range filter.
[GLnexus] [info] Loaded 5 datasets with 2776 samples; 105459735784 bytes in 8064671 BCF records (173 duplicate) in 6449 buckets. Bucket max 105752856 bytes, 5309 records. 0 BCF records skipped due to caller-specific exceptions
[GLnexus] [info] Created sample set *@5
[GLnexus] [info] Flushing database...
[GLnexus] [info] Bulk load complete!
[GLnexus] [warning] Processing full length of 443 contigs, as no --bed was provided. Providing a BED file with regions of interest, if applicable, can speed this up.
[GLnexus] [info] found sample set *@5
[GLnexus] [info] discovering alleles in 443 range(s) on 28 threads
[GLnexus] [error] Failed to discover alleles: IOError: exception deserializing BCF bucket (capnp/arena.c++:127: failed: Exceeded message traversal limit. See capnp::ReaderOptions.
stack: 558149f798d8 558149a1e8f7 558149a2a1e5 558149a2a349 5581499efa38 5581499e7508 55814996d248 2ab467bf6996 5581499e3baa 5581499e3d02 55814997716c 558149fef47e 2ab467beefa2 2ab467d014ce)

Can you help me resolve this issue? Or is it the case that GLnexus cannot merge gVCF files that already contain multiple samples?

Thank you in advance for your help!

@ChaimMacTavish
Copy link

I saw issues about this error one year ago, and it has not been replied yet. Have they abandoned this package?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants