Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process RIBAP:mmseqs2tsv terminated with an error exit status (1) #58

Open
GabrieleRigano99 opened this issue Dec 8, 2023 · 12 comments

Comments

@GabrieleRigano99
Copy link

GabrieleRigano99 commented Dec 8, 2023

Hi, after running the command: nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta "*.fasta" -profile local,docker (i have 3 genomes in my directory) i get this error
ERROR ~ Error executing process > 'RIBAP:mmseqs2tsv'

Caused by:
Process RIBAP:mmseqs2tsv terminated with an error exit status (1)

Command executed:

#mkdir tsv
mmseq2tsv.py mmseq2_result.csv strain_ids.txt . 8 #tsv

Command exit status:
1

Command output:
(empty)

Command error:
Traceback (most recent call last):
File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 94, in
for idx, item in enumerate(chunks(blastTable, chunksize)):
File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 21, in chunks
for i in range(0, len(data), size):
ValueError: range() arg 3 must not be zero

could you help me out please?

@hoelzer
Copy link
Contributor

hoelzer commented Dec 11, 2023

Hey @GabrieleRigano99

Thx for your interest in RIBAP!

Can you please try the following command:

nextflow run hoelzer-lab/riba -r 1.0.2 --fasta '*.fasta' -profile local,docker

Please note the ' instead of ". The reasoning behind that is, that " will be directly expanded in your terminal so your input command will look like this when using ":

nextflow run hoelzer-lab/riba -r 1.0.2 --fasta genome1.fasta genome2.fasta genome3.fasta -profile local,docker

and probably that's causing the issue with mmseqs2.

@GabrieleRigano99
Copy link
Author

Hi @hoelzer,
thank you for the reply! Unfortunately it keeps giving me the same error...

my command:
nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker

ERROR ~ Error executing process > 'RIBAP:mmseqs2tsv'

Caused by:
Process RIBAP:mmseqs2tsv terminated with an error exit status (1)

Command executed:

#mkdir tsv
mmseq2tsv.py mmseq2_result.csv strain_ids.txt . 8 #tsv

Command exit status:
1

Command output:
(empty)

Command error:
Traceback (most recent call last):
File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 94, in
for idx, item in enumerate(chunks(blastTable, chunksize)):
File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 21, in chunks
for i in range(0, len(data), size):
ValueError: range() arg 3 must not be zero

@klamkiew
Copy link
Collaborator

Hi there and thanks a lot for the report :)

It looks like there is something fishy going on in the chunk size calculation to evaluate the MMSeqs2 results efficiently. The fact that size equals zero according to the command error seems very odd.
Can you have a look at the previous intermediate results (prokka, mmseqs2), to estimate the number of genes per genome?

We have a line of code that divides all MMSeqs results into chunks, and if I remember correctly, the default number of chunks is 8.
There could be something going on with the overall MMSeqs2 table being smaller than 8 (which seems very weird, given that you are using three genomes)

As a work-around, you could try to set the --chunks parameter to 1, but this is, of course, no satisfying solution and it will slow down the process and might even not work for your machine, given that it will load everything as one chunk into your memory.

nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chunks 1

So: if this command works for the mmseqs2tsv process, please have a look into the prokka and mmseqs2 results, if possible, and double-check that these seem correct and valid.

If the same error occurs, even with setting --chunks 1, we'd have to dig deeper ;)

@GabrieleRigano99
Copy link
Author

Hi @klamkiew !
Unfortunately it keeps giving me the same error even with the --chuncks 1 flag

nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chuncks 1

I checked the Prokka and MMSeqs results and I can't find any problem in those. I got respectively 5616, 5123 and 4813 genes in my genomes.
[7f/6e805a] process > RIBAP:rename (3) [100%] 3 of 3 ✔
[9b/e0e319] process > RIBAP:prokka (3) [100%] 3 of 3 ✔
[df/a2d352] process > RIBAP:strain_ids [100%] 1 of 1 ✔
[69/2e66aa] process > RIBAP:roary (3) [100%] 2 of 2
[68/7d6694] process > RIBAP:mmseqs2 [100%] 1 of 1 ✔
[20/b42de8] process > RIBAP:mmseqs2tsv [100%] 1 of 1, failed: 1 ✘

The process stops like this

@hoelzer
Copy link
Contributor

hoelzer commented Dec 17, 2023

Hey @GabrieleRigano99 was this your command?

nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chuncks 1

?

because then --chuncks should be --chunks

Can you provide the three genome FASTAs? For example, as a zip archive here? So we can try them out

Thanks!

@GabrieleRigano99
Copy link
Author

oh my bad! I accidentally typed --chuncks instead of --chunks.
It worked with this command, but it took 3h 17m
nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chunks 1
thanks for the support!

@hoelzer
Copy link
Contributor

hoelzer commented Dec 18, 2023

Alright, so @klamkiew was on the right path and actually the calculations work but there is something off with the chunking of the mmseqs2 results. And actually we do the chunking to reduce the runtime.

When you look at the results now, seems the prokka and mmseqs2 results are ok? Do you get as many genes predicted as you would expect per genome? Do you see something odd?

@GabrieleRigano99
Copy link
Author

I can't notice anything odd, it looks good to me in every step

@hoelzer
Copy link
Contributor

hoelzer commented Dec 20, 2023

Ok, thanks for checking. Can you share the 3x FASTA files or are these confidential? I could also provide a secure exchange server if this is fine for you. Otherwise, just zip them in one archive and upload them here. Then we can do some troubleshoting

@GabrieleRigano99
Copy link
Author

I'm sorry, I can't share these data unfortunately. Thank you for your work and for helping me out!

@hoelzer
Copy link
Contributor

hoelzer commented Dec 20, 2023

Ok no problem and that's understandable when the data is confidential. Unfortunately, it's then difficult for us to debug. Maybe: when you are using three other input genomes is it working then? Just some random genomes from NCBI or so. With the default --chunks 8? Maybe there is something in general not working for small input sizes

@GabrieleRigano99
Copy link
Author

Hi @hoelzer,
sorry for the late reply, I was working on other projects.
I tried to add 4 genomes to the previous 3 (different species, but all cyanobacteria) with --chunks 8.
Unfortunately it died with this error:

nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chunks 8

[b2/e7bfb7] process > RIBAP:rename (2) [100%] 7 of 7 ✔
[c6/626d00] process > RIBAP:prokka (7) [100%] 7 of 7 ✔
[6d/6d9494] process > RIBAP:strain_ids [100%] 1 of 1 ✔
[94/f9f534] process > RIBAP:roary (3) [100%] 3 of 3
[df/5f65fc] process > RIBAP:mmseqs2 [100%] 1 of 1 ✔
[12/9cf5c1] process > RIBAP:mmseqs2tsv [100%] 1 of 1 ✔
[c1/70b024] process > RIBAP:ilp_refinement (11) [ 14%] 1 of 7, failed: 1
[- ] process > RIBAP:combine_roary_ilp [ 0%] 0 of 1
[- ] process > RIBAP:prepare_msa -
[- ] process > RIBAP:mafft -
[- ] process > RIBAP:fasttree -
[- ] process > RIBAP:nw_display -
[- ] process > RIBAP:generate_html -
[- ] process > RIBAP:generate_upsetr_input -
[- ] process > RIBAP:upsetr -
ERROR ~ Error executing process > 'RIBAP:ilp_refinement (6)'

Caused by:
Process RIBAP:ilp_refinement (6) terminated with an error exit status (1)

Command executed:

derive_ilp_solutions.py --tmlim 240 --max --indel mmseqs_compressed_chunk4.pkl

Command exit status:
1

Command output:
(empty)

Command error:
Traceback (most recent call last):
File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/derive_ilp_solutions.py", line 151, in
main()
File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/derive_ilp_solutions.py", line 69, in main
blastTable = read_blast_table(pickled_data)
File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/derive_ilp_solutions.py", line 100, in read_blast_table
blastTable = pickle.load(inputStream)
EOFError: Ran out of input

Work dir:
/home/gab/Chroococcidiopsis_project/new_space_pangenome/work/7c/b0fc30d81c7e2c632ccd01b4db7a12

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

-- Check '.nextflow.log' file for details
Could you help me out please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants