Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmseqs splitdb does not preserve annotation of originating fasta files #376

Closed
grst opened this issue Nov 26, 2020 · 4 comments
Closed

mmseqs splitdb does not preserve annotation of originating fasta files #376

grst opened this issue Nov 26, 2020 · 4 comments

Comments

@grst
Copy link

grst commented Nov 26, 2020

Expected Behavior

I use splitdb to run mmseqs search in parallel on a HPC (SGE).
For each match, I would like to retrieve the name of the original fasta file with mmseqs convertalis --format-output "...,qset,tset,...".

Current Behavior

Specifying qset or tset leads to a segmentation fault. Running search and convertalis on the full db works without issues.

Steps to Reproduce (for bugs)

mmseqs createdb test1.faa test2.faa db
mmseqs splitdb db db_split --split 2
for file in db_split_*_2; do
  mmseqs createsubdb ${file}.index db_h ${file}_h
done
mmseqs search db_split_1_2 db resultdb tmp
mmseqs convertalis db_split_1_2 db resultdb results.tsv --format-output "query,target,qset"

MMseqs Output (for bugs)

convertalis db_split_1_2 db resultdb results.tsv --format-output query,target,qset 

MMseqs Version:         45c4de7f1daefa06b45688195305eadedaea4d97
Substitution matrix     nucl:nucleotide.out,aa:blosum62.out
Alignment format        0
Format alignment output query,target,qset
Translation table       1
Gap open cost           nucl:5,aa:11
Gap extension cost      nucl:2,aa:1
Database output         false
Preload mode            0
Search type             0
Threads                 64
Compressed              0
Verbosity               3

repex.sh: line 7: 43190 Segmentation fault      (core dumped) mmseqs convertalis db_split_1_2 db resultdb results.tsv --format-output "query,target,qset"

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): 45c4de7f1daefa06b45688195305eadedaea4d97
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): statically compiled
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory): Intel(R) Xeon(R) CPU E7-4850 v4 @ 2.10GHz, AVX2 support, 3TB RAM
  • Operating system and version:CentOS Linux 7 64bit / Linux 3.10.0-1127.13.1.el7.x86_64
@milot-mirdita
Copy link
Member

milot-mirdita commented Nov 26, 2020

You should probably use the MPI support within MMseqs2 to do this:
https://github.com/soedinglab/mmseqs2#how-to-run-mmseqs2-on-multiple-servers-using-mpi
https://github.com/soedinglab/MMseqs2/wiki#how-to-run-mmseqs2-on-multiple-servers-using-mpi

MMseqs2 MPI will automatically split either the query or target database to fit within memory and will produce a single result database. You'll have to compile MMseqs2 with MPI support though (cmake -DHAVE_MPI=1 ...).

splitdb is probably not symlinking the databases right. I'll have to look when I have time.

@grst
Copy link
Author

grst commented Nov 26, 2020

I was trying to avoid MPI so far... Mostly because I run mmseqs as part of a nextflow pipeline, and I'm not even sure if it is possible to use MPI from there, because nextflow usually takes care of the parallelization.

milot-mirdita added a commit that referenced this issue Nov 26, 2020
@milot-mirdita
Copy link
Member

I added the line to create all the necessary symlinks for convertalis to work. You can compile from source yourself or wait for the CI to upload new binaries in about an hour.

@grst
Copy link
Author

grst commented Nov 26, 2020

This works 🎉
Thanks for fixing this so quickly!

@grst grst closed this as completed Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants