Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

srun --cpus-per-task=1 causing job to be run twice. #6482

Closed
gwolski opened this issue Oct 19, 2024 · 2 comments
Closed

srun --cpus-per-task=1 causing job to be run twice. #6482

gwolski opened this issue Oct 19, 2024 · 2 comments
Labels

Comments

@gwolski
Copy link

gwolski commented Oct 19, 2024

parallelcluster 3.9.1 and 3.11.0.

My examples below are from my 3.11.0 test cluster.

My cluster has multiple instance types for a partition called od-32-gb:

$ sinfo | grep "^od-32-gb"
od-32-gb up infinite 40 idle~ od-c7a-4xl-dy-od-c7a-4xl-[1-10],od-c7i-4xl-dy-od-c7i-4xl-[1-10],od-m7a-2xl-dy-od-m7a-2xl-[1-10],od-r7a-xl-dy-od-r7a-xl-[1-10]

You can see the instance types in the compute resource names. All the compute resources have multi-threading disabled in the configuration file and verified with scontrol show node (all nodes show this same output for ThreadsPerCore):

State=IDLE+CLOUD+POWERED_DOWN ThreadsPerCore=1 TmpDisk=0 Weight=4105 Owner=N/A MCS_label=N/A

In fact, all the ?7a* nodes have hyperthreading disabled by AWS, so I'm suspicious of the c7i instance.

When a user starts a job with srun --cpus-per-task=1, sometimes slurm is starting two jobs.
I cannot reproduce this with a simple command such as hostname or echo hello.

If I replace --cpus-per-task=1 with --ntasks=1, srun does the right thing, i.e. only starts up one job.

It's not unlike what is seen here: https://groups.google.com/g/slurm-users/c/L4nCXtZLlTo
except that this post is related to hyperthreading enabled which I don't have. I have also logged onto the nodes where the jobs are running in duplicate and I see from lscpu the hyperthreading is disabled.

I am using --cpus-per-task to ensure I get a machine with the right core/cpu count when I use my partitions where I'm selecting instance types based on their memory config. I don't want to use --mem=XXX as I use --exclusive and I want the entire machine for the user and I don't want them to have to know how much memory really is available on a 32gb instance type.

Before I dig further in narrowing this problem down, I'm posting here in case someone has some insights or guidance on this issue.

describe-cluster output:
$ pcluster describe-cluster -n tsi4
{
"creationTime": "2024-10-17T21:03:33.056Z",
"headNode": {
"launchTime": "2024-10-17T21:08:39.000Z",
"instanceId": "i-0fdbb2d8be1e83a9d",
"instanceType": "m7a.medium",
"state": "running",
"privateIpAddress": "10.6.3.120"
},
"version": "3.11.0",
"clusterConfiguration": {
"url": "https://parallelcluster-93a06c12efe5c398-v1-do-not-delete.s3.us-west-2.amazonaws.com/parallelcluster/3.11.0/clusters/tsi4-mfnuhjkosy8ub3ol/configs/cluster-config.yaml?versionId=flMbPNVjuSxFoK5Yh6Jo3y_VBSqq0Gyi&AWSAccessKeyId=AKIAUYAYZG3JPXZ2AMIC&Signature=8AHJdhyk7K2cob3jUrbTRlQSTio%3D&Expires=1729362107"
},
"tags": [
{
"value": "3.11.0",
"key": "parallelcluster:version"
},
{
"value": "tsi4",
"key": "parallelcluster:cluster-name"
},
{
"value": "true",
"key": "parallelcluster-ui"
}
],
"cloudFormationStackStatus": "CREATE_COMPLETE",
"clusterName": "tsi4",
"computeFleetStatus": "RUNNING",
"cloudformationStackArn": "arn:aws:cloudformation:us-west-2:326469498578:stack/tsi4/44996cd0-8ccb-11ef-ac1e-069a9ba2d89d",
"lastUpdatedTime": "2024-10-17T21:03:33.056Z",
"region": "us-west-2",
"clusterStatus": "CREATE_COMPLETE",
"scheduler": {
"type": "slurm"
}
}

@gwolski gwolski added the 3.x label Oct 19, 2024
@gwolski gwolski changed the title --cpus-per-task=1 causing job to be run twice. srun --cpus-per-task=1 causing job to be run twice. Oct 19, 2024
@gwolski
Copy link
Author

gwolski commented Oct 21, 2024

I've done more studying/learning on this matter which gets me closer to resolution and most things are making sense, except for the simple hostname command I talk about below..

I've been able to reproduce my problem of multiple jobs starting on a multi-core machine when I specify --cpus-per-task=1 with a simple usage of xterm.
To reproduce "the problem" that my user sees with their complex makefile I run:

srun --time=11:00:00 --job-name=srun_test --cpus-per-task=1 --mem=0 --partition=sp-32-gb --exclusive --pty --x11 xterm

The partition sp-32-gb can choose from three CRs:

sp-32-gb-dy-sp-32gb-4-cores-[2-10],sp-32-gb-dy-sp-32gb-8-cores-[1-10],sp-32-gb-dy-sp-32gb-16-cores-[1-10]

4, 8, or 16 cores is available.
When I run the above srun command, I get a 4 core machine and I get 4 xterms to pop up.

If I change the command on the above srun from 'xterm' to be 'hostname', I was expecting four instances of the hostname to be spit out. Rather only one instance comes out:

$ srun --job-name=srun_test --cpus-per-task=1 --mem=0 --partition=sp-32-gb --exclusive --pty hostname
sp-32-gb-dy-sp-32gb-4-cores-1
$

I don't understand why hostname does not spit out four times. ChatGPT tells me that since hostname is a simple command srun is being smart and squashing additional output. Odd.

I went back to my user's command and I see it actually did start up four identical jobs, one on each cpu. So that is consistent. Great.

That said, I made an assumption that --ntasks=1 is a default value. Clearly the slurm srun documentation proves me wrong:
"If -c is specified without -n, as many tasks will be allocated per node as possible while satisfying the -c restriction. "
This explains why I get 4 xterms. It doesn't explain why I don't get 4x hostname nor why my user's make command only runs twice. I have to keep digging.

My user is now happy running his makefile with:
srun --job-name=srun_test --cpus-per-task=1 --ntasks=1 --mem=0 --partition=sp-32-gb --exclusive --pty make

@gwolski
Copy link
Author

gwolski commented Oct 21, 2024

I'm comfortable with the srun behavior now and understand that one really needs to use --ntasks=1 if you only want the job to run once if you don't use all the cpus on a node.

@gwolski gwolski closed this as completed Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant