-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
advice for increasing speed of --annotate_hits_table #80
Comments
@colinbrislawn you can 1) move the whole emapper directory to /dev/shm or 2) move the |
Thank you for the fast response! Here is how I solved this problem:
Performance was fantastic: |
awesome. I didn't notice you were on a conda environment! |
That's fantastic approach! |
Hi Guys! |
Hello @saras224 👋
Are you asking for some free advice or would you like to hire an expert? Either way, I would write to the folks on the current team listed here: |
Hey @colinbrislawn I thought this platform is where we could ask about the issues that we face while using eggNOG. |
Dear @saras224 , Sorry for the delay answering. In general, it is important to know that eggNOG-mapper has 2 main stages (plus some additional ones, like the gene prediction stage, which you may be using for your contigs). The first stage is the "search" stage, and the second one is the "annotation" stage. You may even run these stages separately. If you are using large contigs as input, my advice is to use Prodigal for the gene prediction step, or maybe using proteins or CDS as input, if you already have prediction from prodigal from other means. During the search stage, diamond should be fine and rather fast (at least with the default parameters). Note also that you should tune the filter thresholds to your needs. For the search step, the more CPUs you can assign to a given job, the better, and also the more sequences you can input into the same diamond process, the faster will be in the end. During the annotation step there is one option which makes everything faster, which is the --dbmem option, but you need at least 44 GB RAM to be able to use it (since the eggNOG-mapper annotation database is loaded into memory). During this stage the number of CPUs is not so important, and you may need to split as much as possible the input data into different jobs, if you can fit the 44GB RAM multiple times in your hardware. For more details into this, please check the wiki within the github project: https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.12 I hope this is of help, but if you provide your specific details it may be easier to help. Best, |
I'm trying to annotate 800k proteins on a compute cluster, and the instructions for large scale analysis have been great!
However, I'm running into serious IO bounds with the final
--annotate_hits_table
, especially when using worker nodes. You mention.How do I set up local caching? I tried copying the database to
/dev/shm
, but this was not recognized and I can't find an option to set a database directory.Thank you for supporting this excellent software.
The text was updated successfully, but these errors were encountered: