Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Unified Cgroups (cgroups v2) #3127
Support Unified Cgroups (cgroups v2) #3127
Changes from 1 commit
43601b2
3db4bc7
0d4775a
2018f61
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't quite figure out why we decided to check
len(dockerStats.CPUStats.CPUUsage.PercpuUsage) == 0
in the first place, but looks like we don't reference this array anywhere else, and usingruntime.NumCPU
is good enough for me.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we think that
len(dockerStats.CPUStats.CPUUsage.PercpuUsage) == 0
was redundant or a mistake? I'm hesitant to remove something we don't seem to have clarity on why it was introduced.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PercpuUsage doesn't appear to be used anywhere in agent, and it's not available anymore with cgroupv2. I'm not sure how else we can check that it's safe to remove? FWIW there isn't really any alternative to just removing the check from what I can tell, since cgroupv2 stats simply dont provide this level of granularity.
If you would prefer we could just have an if-statement here and continue checking this array when we're using cgroupv1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing the intention was to do a simple sanity check of the stats API.
There is an alternative to the
PercpuUsage
field,online_cpus
, according to docker API doc, which should be available when either v1 or v2 cgroups is used:But then I also saw this comment
Earliest docker API reference I could find is for 1.18 and it does not contain the
online_cpu
field.Since ECSAgent-supported docker api version can go back to 1.17, which can still work with 20.10 docker engine (based on this doc) which is when docker introduced cgroupv2, it's not 100% safe for us to check
online_cpus
when cgroupv2 is used.We can still keep the v1 check, but I'm not sure of a safe way to validate docker stats numCPU when using cgroupv2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking into this
memory.use_hierarchy
flag, some information I found on kernel doc (see 6. Hierarchy support). So this basically enables recursive accounting and reclaiming - but I couldn't find the cgroupv2 equivalent (if any). Were we able to verify that container memory consumption gets charged to task cgroup as well?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that since cgroups v2 is by default a "unified hierarchy", this flag no longer applies. From my testing this seems to be the case because limits set on a task slice with two containers are applied to both containers in aggregate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think applying this flag in v1 will charge subcgroup memory usage to its parent (bottom-up). The allocation (top-down) does not get affected with or without the flag.
This is my understanding: if a task cgroup has two containers, task has hard limit 100MB with each container getting 50MB. Now one of the two containers attempts to allocate 200MB
Can we verify if that's the behavior?
And tbh, I'm not entirely sure why we want to enable this hierarchical charging. If a non-essential container gets killed, we should still let the task run...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can test this but I'm fairly certain that it works that both containers have a shared limit of 100MB. So if one container is using only 20MB the other can use up to 80MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the scenario of a task with 512MB of total memory, with one container allocating 100MB and another 450MB. This does seem to be correct, ie...
On AL2022 only the container allocating 450MB is being OOM-killed. I'm not sure how exactly does it decide which container to kill, it might just be that it kills whichever container breaches the memory limit first?
Yes I confirmed that on AL2, when one container hits the limit, BOTH containers are killed by the OOM-killer.
Will investigate how we can turn on use_hierarchy on cgroupv2...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually suggest that we turn off
use_hierarchy
on cgroupv1.If I understanding it correctly, the only time that one container failure should bring down the whole task is when that container is the essential container. On the other hand, if a container is non-essential, it being OOM killed (or terminated in any other way for that matter) should not affect the other containers in the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow up on this,
use_hierarchy
functionality is not completely available on cgroup v2. Thememory.oom.group
flag is close (see https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html), but it causes the entire container to be killed, rather than just the process within the container. We will go forward with the kernel default behavior of only killing the "bulkiest" process when the task-level memory limit is reached.