-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a flag for cpu_hard_limit #3825
Conversation
client/driver/docker.go
Outdated
@@ -120,6 +120,11 @@ const ( | |||
// https://docs.docker.com/engine/reference/run/#block-io-bandwidth-blkio-constraint | |||
dockerBasicCaps = "CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID," + | |||
"SETUID,SETFCAP,SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE" | |||
|
|||
// This is cpu.cfs_period_us: the length of a period. The default values is 100 microsecnds represented in nano Seconds, below is the documnentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap the line to 80 chars
Once I made the changes to the code, I created the nomad binary and started the agent in dev mode. After that I started a nomad job for docker -> stress (docker run --rm -it progrium/stress --cpu 4). In my job, I passed 1500 CPU (out of 24800 MHz), which equates to about 6.04 percent of CPU. And by setting the flag cpu_hard_limit = true, the CPU usage did not go over ~6%. Attached are 3 screenshots:
|
I'm sorry for not responding to this PR sooner. We should support this in a cross-driver way which means adding I'll leave more details on the issue. Sorry again for not letting you know this requirement earlier! |
@schmichael: Sure can add that in resources stanza. I will get started on that, but if you have other specific instructions then let me know whenever you can. |
@jaininshah9 That's quite a bit more work, so on second thought let's get this work in and we can deprecate it once there's a cross-driver solution. Sorry for the confusion! |
client/driver/docker.go
Outdated
// Below is the documnentation: | ||
// https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt | ||
// https://docs.docker.com/engine/admin/resource_constraints/#cpu | ||
defaultCFSPeriod = 100000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for a task allocated 10% of the CPU it could pause for 90ms at a time? That seems awfully long, perhaps 10000 (10ms) or 1000 (1ms) increments would be a better level of granularity to ensure responsiveness for low priority interactive services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if a task is allocated 10% of CPU and we keep the default 100 microseconds cfs_peroid then it will be paused for 90 microseconds. We can change that to 10 ms, for that we will have to pass CPUPeriod as well (which would always be 10 ms )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: Sorry, I meant "ms" as in milli seconds. I know that kernel doc uses "ms" to mean microseconds, so I'm sorry for confusing the matter. Within Nomad code/comments/docs let's always use "ms" for millis and "us" or "μs" for micros.
I'd be in favor of lowering this to 10000μs (10ms) to minimize pause duration unless somebody has a strong preference.
Since it's only used for an internal calculation tweaking it in the future should be ok (and it should probably even be configurable once we make this cross-driver).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schmichael The default for cfs period in docker is 100000, changing this would mean people's experience of using the quota flag would differ between using normal docker cli and Nomad. If you want a lower value, maybe introduce another flag to tweak that but keep the default as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schmichael I am testing the changes that you recommended and running into an issue. When I specify cpu.cfs_period_us=10000 (milliseconds) and when I want to use 6% of CPU, I will have to specify cpu.cfs_quota_us=600(milliseconds). When I do that, it gives me an error: CPU cfs quota cannot be less than 1ms (i.e. 1000)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diptanu has a good point. Matching Docker's setting is the right thing to do. Sorry for the noise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although now I'm confused by the error @jaininshah9 mentioned. According to Docker's API docs this setting should be in microseconds (like the kernel itself expects), not nanoseconds as the code comment implies.
The Docker docs lead me to believe this should be the value?
defaultCFSPeriod = 100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schmichael You may be right, that got me confused as well, I am digging into the error and will let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think initially I made a mistake mentioning that it's a nanosecond. The reason I mentioned nanosecond it is because the docker docs says (see screenshot below) the default value is 100 microseconds, which is wrong, it should say 100 milliseconds (which is what kernel doc says).
I also tested and the docker API does not do any translation. Whatever value we pass in CPUPeriod is what we find under /sys/fs/cgroups...
So the value is the code is correct, I need to update the comment on it though
Sorry for the confusion I created. I will add a change which has a clear comment on what the defaultCFSPeriod refers to.
client/driver/docker.go
Outdated
@@ -1119,6 +1130,12 @@ func (d *DockerDriver) createContainerConfig(ctx *ExecContext, task *structs.Tas | |||
VolumeDriver: driverConfig.VolumeDriver, | |||
} | |||
|
|||
// Calculate CPU Quota | |||
if driverConfig.CPUHardLimit { | |||
percentTicks := float64(task.Resources.CPU) / shelpers.TotalTicksAvailable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use d.node.Resources.CPU
instead as we don't always properly detect the available CPU so some users have to override it.
client/driver/docker.go
Outdated
// https://docs.docker.com/engine/admin/resource_constraints/#cpu | ||
defaultCFSPeriod = 100000 | ||
// https://docs.docker.com/engine/api/v1.35/# | ||
defaultCFSPeriod_us = 100000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No _s in Go. Rename this to `defaultCFSPeriodUS.
LGTM! Just need to fix the merge conflicts before I can pull it. |
Ok, is that something I would do or you would? Happy to do it if I have to
do it?
On Feb 8, 2018 5:31 PM, "Michael Schurter" <[email protected]> wrote:
LGTM! Just need to fix the merge conflicts before I can pull it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3825 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGz_ULwpoGsBuRn5f8EsLYP7ZysxIBrDks5tS6AOgaJpZM4R2G-T>
.
|
I went and resolved the merge conflicts and updated the documentation. Thanks for sticking with this one @jaininshah9! |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #3810