-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Zero queries scanned" resulted from the hmmer annotation #287
Comments
I also tried the Viruses database, and the resulting files are also empty and no hits. |
Hi @Jiulong-Zhao , I am downloading the Bacteria database (which takes time, as you already know) to test your sequences. Meanwhile, could you try without using the --cut_ga option? Regarding your question about using several databases, that would be a nice feature to implement, but I am afraid that currently it is only possible for Diamond and MMseqs2 databases through create_dbs.py. Options you have:
I hope something of this is of help. Best, |
@Cantalapiedra |
Hi @Jiulong-Zhao , "no such table prots". Which DB version are you using? You can check the DB you are using and also the expected one with Best, |
Hi, @Cantalapiedra , However, something wrong happened when I add the "--usemem" and "--dbmem" options to speed up the running.
So, how should I solve this problem? By the way, I am sorry for my stupid question which confused me a lot: Best, |
Hi @Jiulong-Zhao , Glad that we are making progress. The difference of --evalue and --seed_ortholog_evalue is that the first one is applied during the search step, and the second one is applied during the annotation step. Note that in some cases you are only running the annotation step from a pre-existing seed_orthologs file, and in that case you could be interested in using the seed_orthologs_* thresholds. Of course, the parameters themselves could be merged, but they are kept since those options were already present in older versions of emapper. Also, in some cases you may be interested on keeping search hits with a given --evalue in your output (emapper.seed_orthologs file), but filter them later in the annotation step with --seed_ortholog_evalue (emapper.annotations file). I hope this makes sense. Besides that, general information about parameters etc can be found here: https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.0 Happy to answer any question though (there are no stupid questions). Best, |
Hi @Cantalapiedra , I can't be more appreciative of your rapid and helpful reply! Yes, the '--usemem' option might not work on my computer, and it can work just using '--dbmem' only. That's enough for me! Your explanation is also very helpful for me! That's it! All my issues were solved under your assistance for now! Best, |
Glad to help! |
Dear Developers of the eggnog-mapper v2.1.0:
Firstly, thank you all for your effect on this software, which is popular and powerful!
Today I install the latest version of eggnog-mapper (v2.1.0) and try to run this software on my little subset of a large protein database using the hmmer module. But the resulting .annotation file is nearly empty, like this:
Tue Mar 9 21:15:20 2021
emapper-2.1.0
/home/mcs/soft/eggnog-mapper-master/emapper.py -i test_prot.fasta --itype proteins -o sgw_viruses --output_dir ../eggNOG_annotation --override --data_dir /home/mcs/database/eggnog-mapper-data/ -m hmmer --evalue 0.001 --score 30 -d Bacteria --qtype seq --dbtype hmmdb --cut_ga --seed_ortholog_evalue 0.001 --seed_ortholog_score 30 --cpu 30
#query seed_ortholog evalue score eggNOG_OGs narr_OG_name narr_OG_cat narr_OG_desc best_OG_name best_OG_cat best_OG_desc Preferred_name GOs EC KEGG_ko KEGG_Pathway KEGG_Module KEGG_Reaction KEGG_rclass BRITE KEGG_TC CAZy BiGG_Reaction PFAMs
0 queries scanned
Total time (seconds): 0.17600393295288086
Rate: 0.00 q/s
Actually, the .hits and .seed_orthologs files were also empty.
The input fasta file contained 309 protein sequences predicted from the metagenomic contigs.
The command line I typed is:
emapper.py -i test_prot.fasta --itype proteins -o sgw_viruses --output_dir ../eggNOG_annotation --override --data_dir /home/mcs/database/eggnog-mapper-data/ -m hmmer --evalue 0.001 --score 30 -d Bacteria --qtype seq --dbtype hmmdb --cut_ga --seed_ortholog_evalue 0.001 --seed_ortholog_score 30 --cpu 30
So I want to know why there is no result in this file. I am not sure whether the problem is caused by 1) the proteins are all extremely novel (which I think is nearly impossible), 2) the database I used is wrong, or 3) the command I typed was wrong. Or something wrong I didn't notice.
I sincerely need help from you guys! The input fasta file and log information are appended as followed. (By the way, maybe I didn't find the Bacteria.hmm file in my database directory mentioned in the log.txt.)
test_prot.txt
log.txt
The 2nd question is:
I want to query my protein sequences against the Bacteria, Archaea, and Viruses databases of hmmer module together, but only one of them seems to be allowed in the command line every time. I want to select the best hits of bacteria, archaea, or viruses. So how should I do to achieve this goal? To pool these three databases together? Looking forward to your answer! Thanks!
The text was updated successfully, but these errors were encountered: