-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"cores" parameter in resource stanza not behaving as expected #14676
Comments
Hi @bjornbyte 👋 Nomad detected 16 cores in the machine, but none of them are set as reservable: "NodeResources": {
"Cpu": {
"CpuShares": 36848,
"TotalCpuCores": 16,
"ReservableCpuCores": null
}, This may be due the issue described in microsoft/WSL#4189 where WSL doesn't mount cgroups by default, which is what Nomad uses to detect reservable cores. Could you try the fix mentioned in this comment and restart the Nomad agent to see if it's able to detect the researvable cores? |
That appears not to have helped, unfortunately. |
Hi @bjornbyte, without cgroups (and specifically the cpuset controller) you won't be able to make use of If cgroups are enabled you should get output like
And if the cpuset controller is enabled you should see the reservable cores via
|
Add an attribute for the number of reservable CPU cores as they may differ from the existing `cpu.numcores` due to client configuration or OS support. Hopefully clarifies some confusion in #14676
* fingerprint: add node attr for reserverable cores Add an attribute for the number of reservable CPU cores as they may differ from the existing `cpu.numcores` due to client configuration or OS support. Hopefully clarifies some confusion in #14676 * add changelog * num_reservable_cores -> reservablecores
Thanks. could it be that WSL (or the ubuntu version I'm running there) is not putting these where nomad expects them?
and
I took a quick look at the code you linked to around detecting reservable cores, and there seems to be a conditional branch around using v1 or v2, and some of the comments etc there seem to indicate that I've got a "hybryd system" with cgroup2 mounted under /sys/fs/unified and so it should be trying to use V1? I'm not sure what that implies about how nomad will try to determine reservable cores. This stack overflow question suggests I should be able to have WSL not use v1 but when I tried taht .wslconfig then WSL would not start. Even so, It seems like the unified/cgroups2 does not have a
|
Oh interesting, so it's setup with cgroups v1, but we should be able to support that. I'm not sure why you're getting an empty cpuset. When you first startup the Nomad Client agent, do you see any of the WARN log messages from |
I don't see any of those, but I do see this one, which looks suspiciously like it might be important:
|
so, yea, if I run nomad with sudo then:
|
after that, since the /sys/fs/cgroup/cpuset/nomad directory exists, if I start it without sudo I get a warning |
Ah, that'll do it it! Just FYI running Nomad as a non-root user isn't supported yet, but it's something we're exploring in #13669 |
roger that, but it is a handy mechanism to create the right filesystem nodes so that it works when run as normal. Now what I really was after was getting cores working in my local docker compose nomad cluster with multiple client agents running in containers, but what I learned here should help with that and I will open a new issue if I encounter something else that is a nomad problem and not my particular setup problem. Thanks! |
oops, I misread. it's not supported as non-root, although that usually seems to work just fine in dev mode. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Output from
nomad version
Nomad v1.3.5 (1359c25)
Operating system and Environment details
Windows 11 WSL
Issue
When attempting to schedule a job with
in the
task
stanza, plan is returningDimension cores exhausted on 1 node
even though there are no other jobs running and the client attributes in the UI showscpu.numcores | 16
Reproduction steps
in windows WSL ubuntu prompt run
nomad agent -dev
try to submit the below simple job
Expected Result
it should get scheduled
Actual Result
Job file (if appropriate)
Client Node details
I do notice that in the below output, although
Attributes.cpu.numcores
is 16,Resources.Cores
is zero, so I expect that's the problem here, but not sure why that would be.The text was updated successfully, but these errors were encountered: