--max_memory and --max_cpus in latest version of sarek #1765

Ons14 · 2024-12-31T10:56:59Z

Description of feature

Hello,
I am working with human WGS 30X and have a large cohort. I tried to adjust the memory and CPU usage in the latest version of Sarek, but I couldn't. When I used --max_memory and --max_cpus, it showed a warning. Could you please tell me if there's another way to increase the resources used?

Xhepone · 2025-01-02T06:23:04Z

I have similar issue, it says that I require 24 threads and I do have only 12 I have gone through multiple config files etc but I couldn't resolve the issue.

asp8200 · 2025-01-02T08:37:13Z

Please have a look at this post. Cheers

Ons14 · 2025-01-02T13:19:34Z

I am trying to allocate more CPUs and memory to the nf-core/sarek pipeline because I noticed that it is currently using a maximum of 500GB memory, leaving unused resources. My goal is to increase resource allocation so that the pipeline runs faster.

I added this to nextflow.config file that I found in this directory (/home/Ons14/.nextflow/assets/nf-core/sarek):

process {
resourceLimits = [
memory: '800.GB',
cpus: 80,
time: '60.h'
]
}

and then I ran my code but I didn't see any change in my htop command , memory usage doesnt go above 500.GB.

I'am new to using nf-core/sarek pipeline and I would greatly appreciate your help. Thank you!

asp8200 · 2025-01-02T14:03:28Z

I am trying to allocate more CPUs and memory to the nf-core/sarek pipeline because I noticed that it is currently using a maximum of 500GB memory, leaving unused resources. My goal is to increase resource allocation so that the pipeline runs faster.

I added this to nextflow.config file that I found in this directory (/home/Ons14/.nextflow/assets/nf-core/sarek):

process { resourceLimits = [ memory: '800.GB', cpus: 80, time: '60.h' ] }

and then I ran my code but I didn't see any change in my htop command , memory usage doesnt go above 500.GB.

I'am new to using nf-core/sarek pipeline and I would greatly appreciate your help. Thank you!

I think you guys should join the sarek-Slack channel to get answers for these kinds of questions.

As far as I understood, process { resourceLimits = [ memory: '800.GB', cpus: 80, time: '60.h' ] } only sets max-values. Individual tasks/processes may (and will often) have allocated less resources. I believe the resource allocations are being defined here. You could make you on custom config based of base.config and include that with, say, -c custom.config in you nf-command.

FriederikeHanssen · 2025-01-07T09:49:27Z

To add, resourceLimits should be used to define the maximum any given job could ever request on your cluster, for example, the memory/CPU of the largest node you have available (minus some overhead). Resource limits only apply to an individual task that is submitted and not to the whole pipeline.

Nextflow will cap resource request at these max values. This is important to avoid a job requesting more resources than are available and never being submitted. It will not increase resource requests for an individual job.

Most jobs do not need 500GB of memory and therefore we typically set fewer in the base.config. If you wish to increase resources for individual tasks you will need to create a custom config where you define this for the corresponding processes.

On that note: we have set the sarek default requests using 80X WGS samples. It should work for the majority of use cases. If you are seeing issues related to job packing and would like to optimize this, you could try setting resources to half a node and full nodes for the medium/high tasks to ensure several jobs can be submitted. Nextflow will attempts it's best to submit jobs for as long as it finds suitable space and you are not exceeding any usage limits set on your cluster:

To illustrate:

Let's say all your nodes have 16 CPUs and 32 GB memory. If you set the memory requests for all you jobs to 24 GBs. only ever one job can be submitted per node: 32 -24 = 8GB left and no job satisfies this. In this case it could be more efficient to set 15GB per process and always have two jobs submitted in parallel.

Ons14 · 2025-01-07T14:21:18Z

So, the Sarek default base.config would reasonably work with 30X WGS samples, even if I have more CPU and memory to allocate? I am trying to maximise the usage of the server, to parallelize more jobs, since I have a large cohort

FriederikeHanssen · 2025-01-07T14:38:58Z

To parallelize more jobs you want to request just the right amount of resources for each task and then work on putting as many jobs as possible onto the cluster. Increasing the resourcing for a task itself will result in that task having more cpu/memory but fewer jobs being able to run in parallel. using configuration options like queueSize (https://www.nextflow.io/docs/latest/reference/config.html) should allow you to increase the number of jobs you can submit in parallel

Ons14 added the enhancement New feature or request label Dec 31, 2024

asp8200 closed this as completed Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--max_memory and --max_cpus in latest version of sarek #1765

--max_memory and --max_cpus in latest version of sarek #1765

Ons14 commented Dec 31, 2024

Xhepone commented Jan 2, 2025

asp8200 commented Jan 2, 2025

Ons14 commented Jan 2, 2025

asp8200 commented Jan 2, 2025 •

edited

Loading

FriederikeHanssen commented Jan 7, 2025

Ons14 commented Jan 7, 2025

FriederikeHanssen commented Jan 7, 2025

--max_memory and --max_cpus in latest version of sarek #1765

--max_memory and --max_cpus in latest version of sarek #1765

Comments

Ons14 commented Dec 31, 2024

Description of feature

Xhepone commented Jan 2, 2025

asp8200 commented Jan 2, 2025

Ons14 commented Jan 2, 2025

asp8200 commented Jan 2, 2025 • edited Loading

FriederikeHanssen commented Jan 7, 2025

Ons14 commented Jan 7, 2025

FriederikeHanssen commented Jan 7, 2025

asp8200 commented Jan 2, 2025 •

edited

Loading