Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithread / multicore #4448

Closed
AxVE opened this issue Feb 23, 2018 · 5 comments
Closed

Multithread / multicore #4448

AxVE opened this issue Feb 23, 2018 · 5 comments

Comments

@AxVE
Copy link
Contributor

AxVE commented Feb 23, 2018

Is there a way to specify the number of threads or cores to use ? Specifically for HaplotypeCaller ?
In GATK3, we had -nt and -nct (more of less reliable for what I read) but it doesn't seems to be here anymore.

@magicDGS
Copy link
Contributor

For the new GATK framework the multi-thread support is through Spark (see #2345 for more details).

@AxVE
Copy link
Contributor Author

AxVE commented Feb 23, 2018

Ok , thanks for the info ^^

@AxVE
Copy link
Contributor Author

AxVE commented Feb 23, 2018

So it seems we can't set multithreading for non-spark tools ?

@lbergelson
Copy link
Member

Yes, that's true in general. The -nt / -ntc flags options were never very good in gatk3. They usually scaled very poorly with number of cores, and were the cause of a lot of complexity and bugs. We decided not to try to roll our own map reduce framework for gatk4, but use an existing much better one, ie spark.

We recommend multiprocess parallelism with an external job runner like cromwell if you want parallelism in tools that aren't ready in spark yet. This is more complicated to setup and run, but it results in much more efficient use of compute resources.

There are few limited multithreaded options remaining in GATK4 outside of spark. One specific one is the option to use multiple threads with HaplotypeCaller's pairHmm. This is only available on linux systems and defaults to using 4 threads.

@AxVE
Copy link
Contributor Author

AxVE commented Feb 23, 2018

Thanks for the information.
For info, I asked because I'm currently benchmarking SGE cluster vs Spark usage.

@AxVE AxVE closed this as completed Feb 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants