srun --cpus-per-task=1 causing job to be run twice. #6482

gwolski · 2024-10-19T17:28:28Z

parallelcluster 3.9.1 and 3.11.0.

My examples below are from my 3.11.0 test cluster.

My cluster has multiple instance types for a partition called od-32-gb:

$ sinfo | grep "^od-32-gb"
od-32-gb up infinite 40 idle~ od-c7a-4xl-dy-od-c7a-4xl-[1-10],od-c7i-4xl-dy-od-c7i-4xl-[1-10],od-m7a-2xl-dy-od-m7a-2xl-[1-10],od-r7a-xl-dy-od-r7a-xl-[1-10]

You can see the instance types in the compute resource names. All the compute resources have multi-threading disabled in the configuration file and verified with scontrol show node (all nodes show this same output for ThreadsPerCore):

State=IDLE+CLOUD+POWERED_DOWN ThreadsPerCore=1 TmpDisk=0 Weight=4105 Owner=N/A MCS_label=N/A

In fact, all the ?7a* nodes have hyperthreading disabled by AWS, so I'm suspicious of the c7i instance.

When a user starts a job with srun --cpus-per-task=1, sometimes slurm is starting two jobs.
I cannot reproduce this with a simple command such as hostname or echo hello.

If I replace --cpus-per-task=1 with --ntasks=1, srun does the right thing, i.e. only starts up one job.

It's not unlike what is seen here: https://groups.google.com/g/slurm-users/c/L4nCXtZLlTo
except that this post is related to hyperthreading enabled which I don't have. I have also logged onto the nodes where the jobs are running in duplicate and I see from lscpu the hyperthreading is disabled.

I am using --cpus-per-task to ensure I get a machine with the right core/cpu count when I use my partitions where I'm selecting instance types based on their memory config. I don't want to use --mem=XXX as I use --exclusive and I want the entire machine for the user and I don't want them to have to know how much memory really is available on a 32gb instance type.

Before I dig further in narrowing this problem down, I'm posting here in case someone has some insights or guidance on this issue.

describe-cluster output:
$ pcluster describe-cluster -n tsi4
{
"creationTime": "2024-10-17T21:03:33.056Z",
"headNode": {
"launchTime": "2024-10-17T21:08:39.000Z",
"instanceId": "i-0fdbb2d8be1e83a9d",
"instanceType": "m7a.medium",
"state": "running",
"privateIpAddress": "10.6.3.120"
},
"version": "3.11.0",
"clusterConfiguration": {
"url": "https://parallelcluster-93a06c12efe5c398-v1-do-not-delete.s3.us-west-2.amazonaws.com/parallelcluster/3.11.0/clusters/tsi4-mfnuhjkosy8ub3ol/configs/cluster-config.yaml?versionId=flMbPNVjuSxFoK5Yh6Jo3y_VBSqq0Gyi&AWSAccessKeyId=AKIAUYAYZG3JPXZ2AMIC&Signature=8AHJdhyk7K2cob3jUrbTRlQSTio%3D&Expires=1729362107"
},
"tags": [
{
"value": "3.11.0",
"key": "parallelcluster:version"
},
{
"value": "tsi4",
"key": "parallelcluster:cluster-name"
},
{
"value": "true",
"key": "parallelcluster-ui"
}
],
"cloudFormationStackStatus": "CREATE_COMPLETE",
"clusterName": "tsi4",
"computeFleetStatus": "RUNNING",
"cloudformationStackArn": "arn:aws:cloudformation:us-west-2:326469498578:stack/tsi4/44996cd0-8ccb-11ef-ac1e-069a9ba2d89d",
"lastUpdatedTime": "2024-10-17T21:03:33.056Z",
"region": "us-west-2",
"clusterStatus": "CREATE_COMPLETE",
"scheduler": {
"type": "slurm"
}
}

gwolski · 2024-10-21T18:28:37Z

I've done more studying/learning on this matter which gets me closer to resolution and most things are making sense, except for the simple hostname command I talk about below..

I've been able to reproduce my problem of multiple jobs starting on a multi-core machine when I specify --cpus-per-task=1 with a simple usage of xterm.
To reproduce "the problem" that my user sees with their complex makefile I run:

srun --time=11:00:00 --job-name=srun_test --cpus-per-task=1 --mem=0 --partition=sp-32-gb --exclusive --pty --x11 xterm

The partition sp-32-gb can choose from three CRs:

sp-32-gb-dy-sp-32gb-4-cores-[2-10],sp-32-gb-dy-sp-32gb-8-cores-[1-10],sp-32-gb-dy-sp-32gb-16-cores-[1-10]

4, 8, or 16 cores is available.
When I run the above srun command, I get a 4 core machine and I get 4 xterms to pop up.

If I change the command on the above srun from 'xterm' to be 'hostname', I was expecting four instances of the hostname to be spit out. Rather only one instance comes out:

$ srun --job-name=srun_test --cpus-per-task=1 --mem=0 --partition=sp-32-gb --exclusive --pty hostname
sp-32-gb-dy-sp-32gb-4-cores-1
$

I don't understand why hostname does not spit out four times. ChatGPT tells me that since hostname is a simple command srun is being smart and squashing additional output. Odd.

I went back to my user's command and I see it actually did start up four identical jobs, one on each cpu. So that is consistent. Great.

That said, I made an assumption that --ntasks=1 is a default value. Clearly the slurm srun documentation proves me wrong:
"If -c is specified without -n, as many tasks will be allocated per node as possible while satisfying the -c restriction. "
This explains why I get 4 xterms. It doesn't explain why I don't get 4x hostname nor why my user's make command only runs twice. I have to keep digging.

My user is now happy running his makefile with:
srun --job-name=srun_test --cpus-per-task=1 --ntasks=1 --mem=0 --partition=sp-32-gb --exclusive --pty make

gwolski · 2024-10-21T21:55:35Z

I'm comfortable with the srun behavior now and understand that one really needs to use --ntasks=1 if you only want the job to run once if you don't use all the cpus on a node.

gwolski added the 3.x label Oct 19, 2024

gwolski changed the title ~~--cpus-per-task=1 causing job to be run twice.~~ srun --cpus-per-task=1 causing job to be run twice. Oct 19, 2024

gwolski closed this as completed Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

srun --cpus-per-task=1 causing job to be run twice. #6482

srun --cpus-per-task=1 causing job to be run twice. #6482

gwolski commented Oct 19, 2024 •

edited

Loading

gwolski commented Oct 21, 2024 •

edited

Loading

gwolski commented Oct 21, 2024

srun --cpus-per-task=1 causing job to be run twice. #6482

srun --cpus-per-task=1 causing job to be run twice. #6482

Comments

gwolski commented Oct 19, 2024 • edited Loading

gwolski commented Oct 21, 2024 • edited Loading

gwolski commented Oct 21, 2024

gwolski commented Oct 19, 2024 •

edited

Loading

gwolski commented Oct 21, 2024 •

edited

Loading