Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Zero queries scanned" resulted from the hmmer annotation #287

Closed
Jiulong-Zhao opened this issue Mar 9, 2021 · 8 comments
Closed

"Zero queries scanned" resulted from the hmmer annotation #287

Jiulong-Zhao opened this issue Mar 9, 2021 · 8 comments

Comments

@Jiulong-Zhao
Copy link

Dear Developers of the eggnog-mapper v2.1.0:
Firstly, thank you all for your effect on this software, which is popular and powerful!
Today I install the latest version of eggnog-mapper (v2.1.0) and try to run this software on my little subset of a large protein database using the hmmer module. But the resulting .annotation file is nearly empty, like this:

Tue Mar 9 21:15:20 2021

emapper-2.1.0

/home/mcs/soft/eggnog-mapper-master/emapper.py -i test_prot.fasta --itype proteins -o sgw_viruses --output_dir ../eggNOG_annotation --override --data_dir /home/mcs/database/eggnog-mapper-data/ -m hmmer --evalue 0.001 --score 30 -d Bacteria --qtype seq --dbtype hmmdb --cut_ga --seed_ortholog_evalue 0.001 --seed_ortholog_score 30 --cpu 30

#query seed_ortholog evalue score eggNOG_OGs narr_OG_name narr_OG_cat narr_OG_desc best_OG_name best_OG_cat best_OG_desc Preferred_name GOs EC KEGG_ko KEGG_Pathway KEGG_Module KEGG_Reaction KEGG_rclass BRITE KEGG_TC CAZy BiGG_Reaction PFAMs

0 queries scanned

Total time (seconds): 0.17600393295288086

Rate: 0.00 q/s

Actually, the .hits and .seed_orthologs files were also empty.
The input fasta file contained 309 protein sequences predicted from the metagenomic contigs.
The command line I typed is:
emapper.py -i test_prot.fasta --itype proteins -o sgw_viruses --output_dir ../eggNOG_annotation --override --data_dir /home/mcs/database/eggnog-mapper-data/ -m hmmer --evalue 0.001 --score 30 -d Bacteria --qtype seq --dbtype hmmdb --cut_ga --seed_ortholog_evalue 0.001 --seed_ortholog_score 30 --cpu 30
So I want to know why there is no result in this file. I am not sure whether the problem is caused by 1) the proteins are all extremely novel (which I think is nearly impossible), 2) the database I used is wrong, or 3) the command I typed was wrong. Or something wrong I didn't notice.
I sincerely need help from you guys! The input fasta file and log information are appended as followed. (By the way, maybe I didn't find the Bacteria.hmm file in my database directory mentioned in the log.txt.)

test_prot.txt
log.txt

The 2nd question is:
I want to query my protein sequences against the Bacteria, Archaea, and Viruses databases of hmmer module together, but only one of them seems to be allowed in the command line every time. I want to select the best hits of bacteria, archaea, or viruses. So how should I do to achieve this goal? To pool these three databases together? Looking forward to your answer! Thanks!

@Jiulong-Zhao
Copy link
Author

I also tried the Viruses database, and the resulting files are also empty and no hits.

@Cantalapiedra
Copy link
Collaborator

Hi @Jiulong-Zhao ,

I am downloading the Bacteria database (which takes time, as you already know) to test your sequences. Meanwhile, could you try without using the --cut_ga option?

Regarding your question about using several databases, that would be a nice feature to implement, but I am afraid that currently it is only possible for Diamond and MMseqs2 databases through create_dbs.py. Options you have:

  • Download the HMMs and sequences for the DBs, then concatenate the HMMs into a single file, and run hmmpress for it. You could run download_eggnog_db.py with -s (simulate), to have a guide of the commands you may need to setup your database. Copy the commands to download the HMMs and sequences, and download all from the taxa you are interested in. Once you have the HMMs and fasta sequences, mimic the rest of the commands for a single DB (concatenate the hmms with cat, prepare the hmms with hmmpress, clean the sequences etc).
  • Run emapper.py separatedly for the 3 databases and try to merge the annotation results.
  • Use Diamond or MMseqs2 instead (which I guess is not what you want if you did the effort to download the Bacteria databases for hmmer).

I hope something of this is of help.
I will write back once I have tested your sequences to guess what could be happening.

Best,
Carlos

@Jiulong-Zhao
Copy link
Author

@Cantalapiedra
Hi, thank you for your reply!
As you say, I tried the command without --cut_ga against the Viruses hmmer database, but it can not work:
"Error: annotation went wrong for hit ['sec_asb_k141_16372688_4', '747763.D7NW59_9CAUD', 3.1e-19, 60.5]. no such table: prots"
The log.txt is appended:
log.txt
(The command with --cut_ga can work, but the result files are still empty.)
Thank you!

@Cantalapiedra
Copy link
Collaborator

Hi @Jiulong-Zhao ,

"no such table prots". Which DB version are you using? You can check the DB you are using and also the expected one with emapper.py --version.
Note that v2.1.0 should be using DB version 5.0.2, so you could need downloading the eggnog.db file of that version.

Best,
Carlos

@Jiulong-Zhao
Copy link
Author

Hi, @Cantalapiedra
As you suggested, I redownload the DB version 5.0.2 and tried the running of the mentioned command line, and it can work!
So exciting!

However, something wrong happened when I add the "--usemem" and "--dbmem" options to speed up the running.
The wrong message is as follows:

(eggnog-mapper) [mcs@mcs1 function_annotation]$ emapper.py -i prokka/PROKKA_03092021.faa.split/PROKKA_03092021.part_030.faa --itype proteins -o sgw_test --output_dir eggNOG_annotation --override --data_dir /home/mcs/soft/eggnog-mapper-master/data/ -m hmmer --evalue 0.001 --score 30 -d Viruses --usemem --dbmem --qtype seq --dbtype hmmdb --seed_ortholog_evalue 0.001 --seed_ortholog_score 30 --cpu 30
#  emapper-2.1.0
# emapper.py  -i prokka/PROKKA_03092021.faa.split/PROKKA_03092021.part_030.faa --itype proteins -o sgw_test --output_dir eggNOG_annotation --override --data_dir /home/mcs/soft/eggnog-mapper-master/data/ -m hmmer --evalue 0.001 --score 30 -d Viruses --usemem --dbmem --qtype seq --dbtype hmmdb --seed_ortholog_evalue 0.001 --seed_ortholog_score 30 --cpu 30
hmmer.py:search DB: Viruses
/home/mcs/soft/eggnog-mapper-master/data/hmmer/Viruses/Viruses.hmm
hmmer.py:search DB: Viruses, name Viruses, path /home/mcs/soft/eggnog-mapper-master/data/hmmer/Viruses/Viruses.hmm, host localhost, port 51700, endport 53200, idmap /home/mcs/soft/eggnog-mapper-master/data/hmmer/Viruses/Viruses.hmm.idmap
create_servers: hmmdb:/home/mcs/soft/eggnog-mapper-master/data/hmmer/Viruses/Viruses.hmm:localhost:51700-53200
Creating server number 1/1
Loading server at localhost, port 51700-51701
Creating hmmpgmd server at port 51700 ...
Could not create server number 1/1. Fails: 1
Created 0 out of 1
Traceback (most recent call last):
  File "/home/mcs/soft/eggnog-mapper-master/emapper.py", line 639, in <module>
    n, elapsed_time = emapper.run(args, args.input, args.annotate_hits_table, args.cache_file)
  File "/home/mcs/soft/eggnog-mapper-master/eggnogmapper/emapper.py", line 297, in run
    searcher, searcher_name, hits = self.search(args, infile, predictor)
  File "/home/mcs/soft/eggnog-mapper-master/eggnogmapper/emapper.py", line 153, in search
    pjoin(self._current_dir, self.search_out_file))
  File "/home/mcs/soft/eggnog-mapper-master/eggnogmapper/search/hmmer/hmmer.py", line 170, in search
    self.num_servers, self.num_workers, self.cpus_per_worker)
  File "/home/mcs/soft/eggnog-mapper-master/eggnogmapper/search/hmmer/hmmer_server.py", line 91, in create_servers
    raise Exception("Could not create hmmpgmd servers")
Exception: Could not create hmmpgmd servers

So, how should I solve this problem?

By the way, I am sorry for my stupid question which confused me a lot:
What is the difference between the two e-value-related options and two score-related options, respectively (i.e. '--evalue' and '--seed_ortholog_evalue'; and '--score' and '--seed_ortholog_score'.)? I am a little confused about the basic principle of this software. So really thanks for the patient interpretation!
Looking forward to your reply!

Best,
Jiulong

@Cantalapiedra
Copy link
Collaborator

Hi @Jiulong-Zhao ,

Glad that we are making progress.
I am not sure why --usemem is failing. It could happen that there is not enough RAM memory to allocate the whole HMM database. You could try monitoring your memory usage as the DB loads, or just use --dbmem only.

The difference of --evalue and --seed_ortholog_evalue is that the first one is applied during the search step, and the second one is applied during the annotation step. Note that in some cases you are only running the annotation step from a pre-existing seed_orthologs file, and in that case you could be interested in using the seed_orthologs_* thresholds. Of course, the parameters themselves could be merged, but they are kept since those options were already present in older versions of emapper. Also, in some cases you may be interested on keeping search hits with a given --evalue in your output (emapper.seed_orthologs file), but filter them later in the annotation step with --seed_ortholog_evalue (emapper.annotations file). I hope this makes sense.

Besides that, general information about parameters etc can be found here:

https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.0

Happy to answer any question though (there are no stupid questions).

Best,
Carlos

@Jiulong-Zhao
Copy link
Author

Hi @Cantalapiedra ,

I can't be more appreciative of your rapid and helpful reply!

Yes, the '--usemem' option might not work on my computer, and it can work just using '--dbmem' only. That's enough for me!

Your explanation is also very helpful for me!

That's it! All my issues were solved under your assistance for now!
Thanks a lot again to you!

Best,
Jiulong

@Cantalapiedra
Copy link
Collaborator

Glad to help!
I will close this for now. Feel free to reopen or reissue if needed.
Best,
Carlos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants