Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FoldTree crushing in "Run FoldTree" step #11

Open
Nitayah opened this issue Dec 18, 2023 · 5 comments
Open

FoldTree crushing in "Run FoldTree" step #11

Nitayah opened this issue Dec 18, 2023 · 5 comments

Comments

@Nitayah
Copy link

Nitayah commented Dec 18, 2023

Hi! I am trying to run FoldTree with a zipped folder with ~60 pdb files. Here is my folder:

all_pdb_files_18.12.23.zip

The code crashes in "Run Foldtree" branch and returns the following error. I attach it both as an image and as text.

Could you please help me?

image

[Mon Dec 18 12:48:32 2023]
rule dl_ids_sequences:
input: ./test_40814_3/identifiers.txt
output: ./test_40814_3/sequence_dataset.csv
log: ./test_40814_3/logs/dlsequences.log
jobid: 3
reason: Missing output files: ./test_40814_3/sequence_dataset.csv
wildcards: folder=./test_40814_3
resources: tmpdir=/tmp

Activating conda environment: foldtree

EnvironmentNameNotFound: Could not find conda environment: foldtree
You can list all discoverable environments with conda info --envs.

[Mon Dec 18 12:48:37 2023]
Finished job 3.
1 of 15 steps (7%) done
Select jobs to execute...

[Mon Dec 18 12:48:37 2023]
rule dl_ids_structs:
input: ./test_40814_3/sequence_dataset.csv
output: ./test_40814_3/sequences.fst, ./test_40814_3/finalset.csv
log: ./test_40814_3/logs/dlstructs.log
jobid: 2
reason: Missing output files: ./test_40814_3/finalset.csv; Input files updated by another job: ./test_40814_3/sequence_dataset.csv
wildcards: folder=./test_40814_3
resources: tmpdir=/tmp

Activating conda environment: foldtree

EnvironmentNameNotFound: Could not find conda environment: foldtree
You can list all discoverable environments with conda info --envs.

[Mon Dec 18 12:48:39 2023]
Finished job 2.
2 of 15 steps (13%) done
Select jobs to execute...

[Mon Dec 18 12:48:40 2023]
rule plddt:
input: ./test_40814_3/finalset.csv
output: ./test_40814_3/plddt.json
log: ./test_40814_3/logs/plddt.log
jobid: 1
reason: Missing output files: ./test_40814_3/plddt.json; Input files updated by another job: ./test_40814_3/finalset.csv
wildcards: folder=./test_40814_3
resources: tmpdir=/tmp

Activating conda environment: foldtree

[Mon Dec 18 12:48:40 2023]
rule foldseek_allvall_1:
input: ./test_40814_3/finalset.csv
output: ./test_40814_3/allvall_1.csv
log: ./test_40814_3/logs/foldseekallvall.log
jobid: 8
reason: Missing output files: ./test_40814_3/allvall_1.csv; Input files updated by another job: ./test_40814_3/finalset.csv
wildcards: folder=./test_40814_3
resources: tmpdir=/tmp

Activating conda environment: foldtree

EnvironmentNameNotFound: Could not find conda environment: foldtree
You can list all discoverable environments with conda info --envs.

EnvironmentNameNotFound: Could not find conda environment: foldtree
You can list all discoverable environments with conda info --envs.

[Mon Dec 18 12:48:40 2023]
Finished job 8.
3 of 15 steps (20%) done
Select jobs to execute...

[Mon Dec 18 12:48:40 2023]
rule foldseek2distmat:
input: ./test_40814_3/allvall_1.csv
output: ./test_40814_3/foldtree_fastmemat.txt, ./test_40814_3/alntmscore_fastmemat.txt, ./test_40814_3/lddt_fastmemat.txt
log: ./test_40814_3/logs/foldseek2distmat.log
jobid: 7
reason: Missing output files: ./test_40814_3/lddt_fastmemat.txt, ./test_40814_3/foldtree_fastmemat.txt, ./test_40814_3/alntmscore_fastmemat.txt; Input files updated by another job: ./test_40814_3/allvall_1.csv
wildcards: folder=./test_40814_3
resources: tmpdir=/tmp

Activating conda environment: foldtree

EnvironmentNameNotFound: Could not find conda environment: foldtree
You can list all discoverable environments with conda info --envs.

[Mon Dec 18 12:48:42 2023]
Finished job 1.
4 of 15 steps (27%) done
Traceback (most recent call last):
File "/content/.snakemake/scripts/tmp0mskjfj9.foldseekres2distmat_simple.py", line 9, in
res = pd.read_table(snakemake.input[0], header = None)
File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1282, in read_table
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in init
self._engine = self._make_engine(f, self.engine)
File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1723, in _make_engine
return mapping[engine](f, **self.options)
File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init
self._reader = parsers.TextReader(src, **kwds)
File "parsers.pyx", line 586, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file
[Mon Dec 18 12:48:43 2023]
Error in rule foldseek2distmat:
jobid: 7
input: ./test_40814_3/allvall_1.csv
output: ./test_40814_3/foldtree_fastmemat.txt, ./test_40814_3/alntmscore_fastmemat.txt, ./test_40814_3/lddt_fastmemat.txt
log: ./test_40814_3/logs/foldseek2distmat.log (check log file(s) for error details)
conda-env: foldtree

RuleException:
CalledProcessError in file /content/fold_tree/workflow/fold_tree, line 90:
Command 'source /usr/local/bin/activate 'foldtree'; set -euo pipefail; /usr/local/bin/python3.10 /content/.snakemake/scripts/tmp0mskjfj9.foldseekres2distmat_simple.py' returned non-zero exit status 1.
File "/content/fold_tree/workflow/fold_tree", line 90, in __rule_foldseek2distmat
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-12-18T124832.269382.snakemake.log

CalledProcessError Traceback (most recent call last)
in <cell line: 1>()
----> 1 get_ipython().run_cell_magic('bash', '-s $jobname $input_type', 'JOBNAME=$1\nINPUT_TYPE=$2\nSUFFIX=""\nif [[ $INPUT_TYPE = "custom" ]]; then\n mkdir -p "${JOBNAME}/structs"\n mv "${JOBNAME}/".pdb "${JOBNAME}/".cif "${JOBNAME}/structs"\n SUFFIX="custom_structs=True"\nfi\nsnakemake --cores $(nproc --all) --use-conda -s fold_tree/workflow/fold_tree --config folder="./${JOBNAME}" filter=False $SUFFIX #&gt; /dev/null 2&gt;&amp;1\n#snakemake --cores 4 --use-conda -s fold_tree/workflow/fold_tree --config folder=./${jobname} filter=False\n')

4 frames
in shebang(self, line, cell)

/usr/local/lib/python3.10/dist-packages/IPython/core/magics/script.py in shebang(self, line, cell)
243 sys.stderr.flush()
244 if args.raise_error and p.returncode!=0:
--> 245 raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
246
247 def _run_script(self, p, cell, to_close):

CalledProcessError: Command 'b'JOBNAME=$1\nINPUT_TYPE=$2\nSUFFIX=""\nif [[ $INPUT_TYPE = "custom" ]]; then\n mkdir -p "${JOBNAME}/structs"\n mv "${JOBNAME}/".pdb "${JOBNAME}/".cif "${JOBNAME}/structs"\n SUFFIX="custom_structs=True"\nfi\nsnakemake --cores $(nproc --all) --use-conda -s fold_tree/workflow/fold_tree --config folder="./${JOBNAME}" filter=False $SUFFIX #&gt; /dev/null 2&gt;&amp;1\n#snakemake --cores 4 --use-conda -s fold_tree/workflow/fold_tree --config folder=./${jobname} filter=False\n'' returned non-zero exit status 1.

@ruthalee
Copy link

I am having the same problem with my custom run. There are closed issues that are similar to this (#3, #6), but I am not sure that those solutions apply here, as they would have been fixed. When I clicked the files tab on the left hand side, my struct folders were empty and the indentifiers.txt files said "1". Thanks for your help!

Input:
Zipped folder containing >100 pdb files.
input

Error Message:
Screenshot 2024-01-23 at 6 03 24 PM

@cactuskid
Copy link
Contributor

this may have been due to using an outdated version of foldseek on the backend. I've changed this parameter on the configfile so it might be working now.

@msleutel
Copy link

The problem persists for me (tested on 11/04/2024). I get the same error for an "identifier" run as well as a "custom" run

image

@trinicordero
Copy link

I am having the same problem (17/05/24) :( pleaseeee help!

@KeaunAmani
Copy link

The issue persists for me as well. Seems that the foldseek code is bugged currently for custom structures and has been bugged for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants