Support Unified Cgroups (cgroups v2) #3127

sparrc · 2022-02-11T21:52:13Z

Summary

Implements part of aws/containers-roadmap#1535

This adds support for task-level resource limits when running on unified cgroups (aka cgroups v2) with the systemd cgroup driver.

Cgroups v2 has introduced a cgroups format that is not backward compatible with cgroups v1. In order to support both v1 and v2, we have added a config variable to detect which cgroup version the ecs agent is running with. The containerd/cgroups library is used to determine which mode it is using on agent startup.

When running with cgroups v2 docker containers are normally created under the path /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope path.

When using task resource limits, the task receives it's own cgroup slice, and the containers are created under the path /sys/fs/cgroup/ecstasks.slice/ecstasks-${TASK_ID}.slice/docker-${CONTAINER_ID}.scope. The ecstasks-${TASK_ID}.slice slice is created with the resource limits specified in the task definition, and every docker container under it is subject to these limits.

For a good doc and overview of how these cgroup hierarchies now work with the systemd driver, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/resource_management_guide/sec-default_cgroup_hierarchies

Because docker uses the systemd cgroup driver by default with cgroup v2, resource limit actions are no longer logged by the agent or docker. OOM-kill messages can now be found in the system journal (journalctl --system) or /var/log/messages.

Testing

New tests cover the changes: yes

Functional tests run to verify that v1 behavior is unchanged.

Manually tested running on a cgroup v2 system and verified that task-level mem/cpu limits are applied.

Ran the full functional testing suite on a cgroupv2 system (AL2022) and verified all tests pass.

Description for the changelog

Enhancement - Support for unified cgroups and the systemd cgroup driver.

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

agent/config/config_linux.go

agent/api/task/task_linux.go

agent/taskresource/cgroup/control/cgroupv2_controller_linux.go

agent/api/task/task_linux.go

yinyic · 2022-02-15T18:45:42Z

agent/stats/utils_unix.go

@@ -39,9 +39,8 @@ func dockerStatsToContainerStats(dockerStats *types.StatsJSON) (*ContainerStats,
 }

 func validateDockerStats(dockerStats *types.StatsJSON) error {
-	// The length of PercpuUsage represents the number of cores in an instance.
-	if len(dockerStats.CPUStats.CPUUsage.PercpuUsage) == 0 || numCores == uint64(0) {


I couldn't quite figure out why we decided to check len(dockerStats.CPUStats.CPUUsage.PercpuUsage) == 0 in the first place, but looks like we don't reference this array anywhere else, and using runtime.NumCPU is good enough for me.

So, we think that len(dockerStats.CPUStats.CPUUsage.PercpuUsage) == 0 was redundant or a mistake? I'm hesitant to remove something we don't seem to have clarity on why it was introduced.

PercpuUsage doesn't appear to be used anywhere in agent, and it's not available anymore with cgroupv2. I'm not sure how else we can check that it's safe to remove? FWIW there isn't really any alternative to just removing the check from what I can tell, since cgroupv2 stats simply dont provide this level of granularity.

If you would prefer we could just have an if-statement here and continue checking this array when we're using cgroupv1?

I'm guessing the intention was to do a simple sanity check of the stats API.
There is an alternative to the PercpuUsage field, online_cpus, according to docker API doc, which should be available when either v1 or v2 cgroups is used:

number_cpus = lenght(cpu_stats.cpu_usage.percpu_usage) or cpu_stats.online_cpus

But then I also saw this comment

If either precpu_stats.online_cpus or cpu_stats.online_cpus is nil then for compatibility with older daemons the length of the corresponding cpu_usage.percpu_usage array should be used.

Earliest docker API reference I could find is for 1.18 and it does not contain the online_cpu field.

Since ECSAgent-supported docker api version can go back to 1.17, which can still work with 20.10 docker engine (based on this doc) which is when docker introduced cgroupv2, it's not 100% safe for us to check online_cpus when cgroupv2 is used.

We can still keep the v1 check, but I'm not sure of a safe way to validate docker stats numCPU when using cgroupv2.

agent/taskresource/cgroup/control/cgroupv2_controller_linux.go

closes aws/containers-roadmap#1535 closes aws#3117 This adds support for task-level resource limits when running on unified cgroups (aka cgroups v2) with the systemd cgroup driver. Cgroups v2 has introduced a cgroups format that is not backward compatible with cgroups v1. In order to support both v1 and v2, we have added a config variable to detect which cgroup version the ecs agent is running with. The containerd/cgroups library is used to determine which mode it is using on agent startup. Cgroups v2 no longer can provide per-cpu usage stats, so this validation was removed since we never used it either.

agent/taskresource/cgroup/control/cgroup_controller_linux_test.go

agent/config/config_linux.go

agent/config/config_unix.go

yinyic · 2022-02-16T18:08:27Z

agent/taskresource/cgroup/cgroup.go

-	if err != nil {
-		return fmt.Errorf("cgroup resource [%s]: setup cgroup: unable to set use hierarchy flag: %w", cgroup.taskARN, err)
+	if !config.CgroupV2 {
+		// enabling cgroup memory hierarchy by doing 'echo 1 > memory.use_hierarchy'


I was looking into this memory.use_hierarchy flag, some information I found on kernel doc (see 6. Hierarchy support). So this basically enables recursive accounting and reclaiming - but I couldn't find the cgroupv2 equivalent (if any). Were we able to verify that container memory consumption gets charged to task cgroup as well?

My understanding was that since cgroups v2 is by default a "unified hierarchy", this flag no longer applies. From my testing this seems to be the case because limits set on a task slice with two containers are applied to both containers in aggregate.

I think applying this flag in v1 will charge subcgroup memory usage to its parent (bottom-up). The allocation (top-down) does not get affected with or without the flag.
This is my understanding: if a task cgroup has two containers, task has hard limit 100MB with each container getting 50MB. Now one of the two containers attempts to allocate 200MB

without the flag, only the container will be killed

with the flag, because the memory allocation got charged to the task, the task will also be oom-killed.

Can we verify if that's the behavior?

And tbh, I'm not entirely sure why we want to enable this hierarchical charging. If a non-essential container gets killed, we should still let the task run...

if a task cgroup has two containers, task has hard limit 100MB with each container getting 50MB

I can test this but I'm fairly certain that it works that both containers have a shared limit of 100MB. So if one container is using only 20MB the other can use up to 80MB.

I tested the scenario of a task with 512MB of total memory, with one container allocating 100MB and another 450MB. This does seem to be correct, ie...

without the flag, only the container will be killed

On AL2022 only the container allocating 450MB is being OOM-killed. I'm not sure how exactly does it decide which container to kill, it might just be that it kills whichever container breaches the memory limit first?

with the flag, because the memory allocation got charged to the task, the task will also be oom-killed.

Yes I confirmed that on AL2, when one container hits the limit, BOTH containers are killed by the OOM-killer.

Will investigate how we can turn on use_hierarchy on cgroupv2...

Will investigate how we can turn on use_hierarchy on cgroupv2...

I would actually suggest that we turn off use_hierarchy on cgroupv1.

If I understanding it correctly, the only time that one container failure should bring down the whole task is when that container is the essential container. On the other hand, if a container is non-essential, it being OOM killed (or terminated in any other way for that matter) should not affect the other containers in the task.

To follow up on this, use_hierarchy functionality is not completely available on cgroup v2. The memory.oom.group flag is close (see https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html), but it causes the entire container to be killed, rather than just the process within the container. We will go forward with the kernel default behavior of only killing the "bulkiest" process when the task-level memory limit is reached.

agent/taskresource/cgroup/control/cgroupv2_controller_linux.go

angelcar · 2022-02-25T19:04:43Z

agent/stats/utils_unix.go

@@ -39,9 +39,8 @@ func dockerStatsToContainerStats(dockerStats *types.StatsJSON) (*ContainerStats,
 }

 func validateDockerStats(dockerStats *types.StatsJSON) error {
-	// The length of PercpuUsage represents the number of cores in an instance.
-	if len(dockerStats.CPUStats.CPUUsage.PercpuUsage) == 0 || numCores == uint64(0) {


So, we think that len(dockerStats.CPUStats.CPUUsage.PercpuUsage) == 0 was redundant or a mistake? I'm hesitant to remove something we don't seem to have clarity on why it was introduced.

agent/taskresource/cgroup/control/cgroupv2_controller_linux.go

* Support Unified Cgroups (cgroups v2) closes aws/containers-roadmap#1535 closes aws#3117 This adds support for task-level resource limits when running on unified cgroups (aka cgroups v2) with the systemd cgroup driver. Cgroups v2 has introduced a cgroups format that is not backward compatible with cgroups v1. In order to support both v1 and v2, we have added a config variable to detect which cgroup version the ecs agent is running with. The containerd/cgroups library is used to determine which mode it is using on agent startup. Cgroups v2 no longer can provide per-cpu usage stats, so this validation was removed since we never used it either. * wip * update cgroups library with nil panic bugfix * Initialize and toggle cgroup controllers

* Support Unified Cgroups (cgroups v2) closes aws/containers-roadmap#1535 closes #3117 This adds support for task-level resource limits when running on unified cgroups (aka cgroups v2) with the systemd cgroup driver. Cgroups v2 has introduced a cgroups format that is not backward compatible with cgroups v1. In order to support both v1 and v2, we have added a config variable to detect which cgroup version the ecs agent is running with. The containerd/cgroups library is used to determine which mode it is using on agent startup. Cgroups v2 no longer can provide per-cpu usage stats, so this validation was removed since we never used it either. * wip * update cgroups library with nil panic bugfix * Initialize and toggle cgroup controllers

sparrc force-pushed the cgroupv2 branch 8 times, most recently from c851e59 to 13a56a9 Compare February 11, 2022 23:48

sparrc added the bot/test label Feb 11, 2022

amazon-ecs-bot removed the bot/test label Feb 11, 2022

sparrc force-pushed the cgroupv2 branch from 13a56a9 to b9bfe11 Compare February 14, 2022 17:30

sparrc added the bot/test label Feb 14, 2022

amazon-ecs-bot removed the bot/test label Feb 14, 2022

sparrc changed the title ~~wip~~ Support task-level resource limits on Unified Cgroups (cgroups v2) Feb 14, 2022

sparrc changed the base branch from dev to feature/cgroupv2 February 14, 2022 18:52

sparrc mentioned this pull request Feb 14, 2022

Support unified cgroups (cgroups v2) aws/containers-roadmap#1535

Closed

chienhanlin reviewed Feb 14, 2022

View reviewed changes

agent/config/config_linux.go Show resolved Hide resolved

angelcar reviewed Feb 15, 2022

View reviewed changes

agent/api/task/task_linux.go Outdated Show resolved Hide resolved

agent/taskresource/cgroup/control/cgroupv2_controller_linux.go Outdated Show resolved Hide resolved

agent/taskresource/cgroup/control/cgroupv2_controller_linux.go Outdated Show resolved Hide resolved

sparrc force-pushed the cgroupv2 branch 3 times, most recently from 0288bd3 to 123c4e8 Compare February 15, 2022 18:19

sparrc added the bot/test label Feb 15, 2022

amazon-ecs-bot removed the bot/test label Feb 15, 2022

sparrc changed the title ~~Support task-level resource limits on Unified Cgroups (cgroups v2)~~ Support Unified Cgroups (cgroups v2) Feb 15, 2022

yinyic reviewed Feb 15, 2022

View reviewed changes

sparrc force-pushed the cgroupv2 branch 2 times, most recently from 1db1587 to b7ebe56 Compare February 16, 2022 00:01

sparrc force-pushed the cgroupv2 branch from b7ebe56 to 43601b2 Compare February 16, 2022 00:11

sparrc added the bot/test label Feb 16, 2022

amazon-ecs-bot removed the bot/test label Feb 16, 2022

angelcar reviewed Feb 16, 2022

View reviewed changes

agent/taskresource/cgroup/control/cgroup_controller_linux_test.go Show resolved Hide resolved

yinyic reviewed Feb 16, 2022

View reviewed changes

sparrc added 2 commits February 16, 2022 13:16

wip

3db4bc7

update cgroups library with nil panic bugfix

0d4775a

sparrc force-pushed the cgroupv2 branch 2 times, most recently from 264705f to 438de7e Compare February 24, 2022 01:10

sparrc added the bot/test label Feb 24, 2022

amazon-ecs-bot removed the bot/test label Feb 24, 2022

angelcar reviewed Feb 25, 2022

View reviewed changes

sparrc force-pushed the cgroupv2 branch from 438de7e to f8da578 Compare February 25, 2022 21:34

Initialize and toggle cgroup controllers

2018f61

sparrc force-pushed the cgroupv2 branch from f8da578 to 2018f61 Compare February 25, 2022 22:24

sparrc added the bot/test label Mar 1, 2022

amazon-ecs-bot removed the bot/test label Mar 1, 2022

yinyic approved these changes Mar 1, 2022

View reviewed changes

angelcar approved these changes Mar 2, 2022

View reviewed changes

sparrc merged commit d3e7150 into aws:feature/cgroupv2 Mar 2, 2022

sparrc deleted the cgroupv2 branch March 2, 2022 19:23

yinyic mentioned this pull request Mar 25, 2022

Apply minimumCPUShare to both task and container CPU shares #3156

Merged

This was referenced Apr 21, 2022

ECS now supports cgroups v2 aws/amazon-ecs-init#502

Closed

ECS now supports cgroups v2 aws/amazon-ecs-init#503

Merged

chienhanlin mentioned this pull request Apr 11, 2024

The CPU parameter in the task is not behaving as a hard limit and is not setting either cpu-quota or cpu #4077

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Unified Cgroups (cgroups v2) #3127

Support Unified Cgroups (cgroups v2) #3127

sparrc commented Feb 11, 2022 •

edited

Loading

yinyic Feb 15, 2022

angelcar Feb 25, 2022

sparrc Feb 25, 2022

yinyic Feb 25, 2022

yinyic Feb 16, 2022

sparrc Feb 16, 2022 •

edited

Loading

yinyic Feb 25, 2022 •

edited

Loading

sparrc Feb 25, 2022

sparrc Feb 25, 2022

yinyic Feb 25, 2022 •

edited

Loading

sparrc Mar 1, 2022

angelcar Feb 25, 2022

Support Unified Cgroups (cgroups v2) #3127

Support Unified Cgroups (cgroups v2) #3127

Conversation

sparrc commented Feb 11, 2022 • edited Loading

Summary

Testing

Description for the changelog

Licensing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sparrc Feb 16, 2022 • edited Loading

Choose a reason for hiding this comment

yinyic Feb 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yinyic Feb 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sparrc commented Feb 11, 2022 •

edited

Loading

sparrc Feb 16, 2022 •

edited

Loading

yinyic Feb 25, 2022 •

edited

Loading

yinyic Feb 25, 2022 •

edited

Loading