-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CGroups CPU period configurable #1632
Conversation
1488696
to
ed0e31b
Compare
Hi @boynux, thanks for sending this! I wanted to give some context on why the CPU period is hard-coded to 100ms today, and what needs to be considered in changing it. CPU period and CPU quota are the mechanisms that we employ to power the task size CPU hard limits. When you specify vCPU in your task definition, we translate that into the CPU period and CPU quota settings that apply to the CPU quota controls the amount of CPU time granted to a cgroup during a given CPU period. Both settings are expressed in terms of microseconds; having a CPU quota that equals CPU period means that a cgroup can execute up to 100% on one vCPU (or 50% on each of two vCPUs, or any other fraction that totals to 100% of one). The CPU quota has a maximum of 1000000us and CPU period has a minimum of 1ms, so it becomes a math problem to express limits that make sense for a given CPU count and the limits that you want to set. Changing the CPU period without changing the CPU quota will cause you to have different effective limits than what you've specified in your task definition; these values need to co-vary with the vCPU task size. The hard-coded 100ms period that we have today lets us express values for vCPUs ranging from 0.125 to 10, and this is reflected in the ECS task definition API validation (i.e., if you attempt to specify greater than 10 or less than 0.125 in your task definition, you'll get an error back). Changing the quota changes the values that would be valid (from the Linux kernel's perspective) for controlling task size, and we would potentially need to adjust the ECS API to reflect that. This is recorded in the comments of
I don't think we'll be able to take this pull request in its current form, because of how it interacts with the task definition's task size feature. In order to allow the CPU period to change, we'll need to both adjust the CPU quota in accordance with the task size and adjust the task definition validation logic. |
@samuelkarp Thanks for your reply, as it is mentioned in the description we are experiencing some issues with the current 100ms default period which seems to be related to the way CFS calculate the used quota. This makes task level quota very inefficient and unusable to some extend for us and most likely other users. |
afb836c
to
41fad33
Compare
@samuelkarp now it's limited to 1ms to 100ms only 👼 |
41fad33
to
39b7c18
Compare
@boynux: do you mind updating your branch with the latest changes from dev and resolving the conflicts? though i'll double check with our team, the bounded range via the config seems like a reasonable compromise to me. |
Signed-off-by: Mohamad Arab <[email protected]>
39b7c18
to
36267c2
Compare
@adnxn in case you didn't get the notification, I rebased master to my PR. Should be good to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so playing around with these changes some more, it seems the 1ms
lower bound will keep the taskCPUPeriod
AT the lower bound considered valid by linux. as per CFS docs, "The minimum quota allowed for the quota or period is 1ms."
however with a 1ms
taskCPUPeriod, we'll end up with a 125us
taskCPUQuota, this is below the minimum quota allowed by linux. so it seems like we'll need a lower bound of at least 8ms
for cpu period to ensure we stay within the range considered valid by linux such that taskCPUQuota is at 1000us
.
@boynux: im closing this PR for now. but if feel free to address the comments and reopen if you'd like us to take a look again. sorry about the delay on our end! 😳 |
Summary
Fixes #1627
Make CFS CPU Period configurable
CFS keeps throttling tasks very excessively if the default period 100ms is specified, to reduce the impact we should be able to set lower CPU period when defining cgroup quotas.
Implementation details
New environment parameter added to adjust CFS period.
Testing
See here: https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1
-->
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes: yes
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.