Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroupv2: fix fs2 driver initialization #2299

Merged
merged 13 commits into from
Apr 21, 2020

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 7, 2020

The main goal of this patchset is to fix fs2 driver initialization. With these patches, the tests that I added in #2295 are now passing with fs2 driver (i.e. when RUNC_USE_SYSTEMD is not set), except for one case (#2298).

I also did some major cleanups, mostly for systemd driver. A high-level overview:

  • remove legacy paths map from v2 driver
  • untangle systemd v2 initialization;
  • reorganized systemd code into v1, v2, common, and unsupported.

Please see individual commits for details.

@AkihiroSuda PTAL (this was discussed before in #2281 (comment))

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--- FAIL: TestUsernsCheckpoint (0.26s)
    checkpoint_test.go:130: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN   TestCheckpoint
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestCheckpoint (0.03s)
    checkpoint_test.go:130: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN   TestExecPS
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestExecPS (0.03s)
    exec_test.go:52: |: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN   TestUsernsExecPS
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestUsernsExecPS (0.03s)
    exec_test.go:52: |: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN   TestIPCPrivate
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestIPCPrivate (0.03s)
    utils_test.go:55: exec_test.go:82: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
        
=== RUN   TestIPCHost
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestIPCHost (0.02s)
    utils_test.go:55: exec_test.go:108: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
        
=== RUN   TestIPCJoinPath
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestIPCJoinPath (0.03s)
    utils_test.go:55: exec_test.go:135: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
        
=== RUN   TestIPCBadPath
--- PASS: TestIPCBadPath (0.03s)
=== RUN   TestRlimit
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestRlimit (0.03s)
    utils_test.go:55: exec_test.go:200: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
        
=== RUN   TestUsernsRlimit
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestUsernsRlimit (0.03s)
    utils_test.go:55: exec_test.go:200: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
        
=== RUN   TestEnter
--- SKIP: TestEnter (0.00s)
    exec_test.go:211: cgroup v2 is not supported
=== RUN   TestProcessEnv
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestProcessEnv (0.03s)
    utils_test.go:55: exec_test.go:326: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
        
=== RUN   TestProcessEmptyCaps
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestProcessEmptyCaps (0.03s)
    utils_test.go:55: exec_test.go:370: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
        
=== RUN   TestProcessCaps
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestProcessCaps (0.03s)
    utils_test.go:55: exec_test.go:423: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
        
=== RUN   TestAdditionalGroups
time="2020-04-07T21:54:52Z" level=warning msg="signal: killed"
--- FAIL: TestAdditionalGroups (0.03s)
    utils_test.go:55: exec_test.go:487: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""

@kolyshkin
Copy link
Contributor Author

write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy

  • if this is in test with --systemd-cgroup, then I don't understand why is this happening as I haven't changed anything;
  • if this is in test without --systemd-cgroup but systemd is present, then --systemd-cgroup should be used;
  • if this is in test without --systemd-cgroup and systemd is not present, then I don't have an idea about what is going on :(

@AkihiroSuda
Copy link
Member

the errors are happening on Travis
https://travis-ci.org/github/opencontainers/runc/jobs/672264460

@kolyshkin
Copy link
Contributor Author

The problem is we're enabling all the controllers no matter what limits are set. Two ways to fix this are:

  1. enable all controllers on a "best effort" basis (i.e. ignore the errors during init time, return errors during set time)
  2. only enable needed controllers.

I like 2 much better. Let's see if this can be done (and it will fix the above tests since looks like they do not set any limits)

@kolyshkin
Copy link
Contributor Author

Added a commit to enable controllers only when required and on a best-effort basis.

cgroup-v2 is passing now. The tests I add/enable in #2295 are passing, too, on my test box.

@kolyshkin
Copy link
Contributor Author

@crosbymichael @mrunalp PTAL

@kolyshkin
Copy link
Contributor Author

@cyphar PTAL (at least for those two patches that add/remove securejoin usage)

@kolyshkin
Copy link
Contributor Author

Rebased to resolve conflict after #2288 is merged

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Apr 8, 2020

Let me rework the last patch (enable controllers on demand) -- while it works fine, I don't like that neededControllers() has the intimate knowledge of all subsystems in one place.

Update: done

@kolyshkin
Copy link
Contributor Author

@AkihiroSuda please let me know if you want me to split into two PRs, as this one might be too big to review. I tried hard to split it into small logical commits but there are too many.

@kolyshkin
Copy link
Contributor Author

fixed a bug in last commit

func setCpu(dirPath string, cgroup *configs.Cgroup) error {
if !isCpuSet(cgroup) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this validation is useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added it is peculiar -- if someone will be modifying setCpu(), adding more fields to set, they will be reminded to also modify isCpuSet(). Having this check here might also help to uncover bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. someone adds cpuFoobar setting to setCpu(), but forgets to modify isCpuSet(). They add a unit test to check if cpuFoobar is applied, and the test fails! Hope it makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be just code comment lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be just code comment lines?

What do you mean? Add a comment like "if you're modifying this, don't forget to modify isSetCpu"? In my experience, this never works. The added runtime check might actually help to uncover such a bug.

Or do you mean adding a comment stating the reason why this check is added?

Copy link
Contributor Author

@kolyshkin kolyshkin Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you mean adding a comment stating the reason why this check is added?

I have amended the commit telling why is*Set is added to set*:

  1. Amend all setFoo() functions to call isFooSet(). While this might
    seem unnecessary, it might actually help to uncover a bug.
    Imagine someone:

    • adds a cgroup.Resources.CpuFoo setting;
    • modifies setCpu() to apply the new setting;
    • but forgets to amend isCpuSet() accordingly <-- BUG

    In this case, a test case modifying CpuFoo will help
    to uncover the BUG. This is the reason why it's added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the former, but not a blocker

// systemd represents slice hierarchy using `-`, so we need to follow suit when
// generating the path of slice. Essentially, test-a-b.slice becomes
// /test.slice/test-a.slice/test-a-b.slice.
func ExpandSlice(slice string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new in this PR, but can we have UT for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do (maybe in a separate PR?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, can be a separate PR

@kolyshkin kolyshkin force-pushed the fs2-init-ctrl branch 2 times, most recently from 4edf797 to 95e0b4e Compare April 8, 2020 21:17
@AkihiroSuda
Copy link
Member

Looks good, but I'd like to merge the integration tests (#2295) before this.

kolyshkin added a commit to kolyshkin/runc that referenced this pull request Apr 12, 2020
Those needs to be run on the (Vagrant Fedora 31) host
(since we need real systemd running), and so we have
to have all the tools needed to compile runc and run
the tests.

The good news is Fedora packages a decent and recent release
of bats-core (1.1.0), which we can use (Debian does not),
and we can also use golang (currently 1.13.9) from Fedora.

The bad news are

 1. Currently cgroups tests are only working with
    RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305)

 2. Some tests that are not yet working:

    - ps.bats with RUNC_USE_SYSTEMD=yes (investigating)
    - events.bats (need cgroupv2 memory.events support)

[v2: add -t to ssh to enforce pty]

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Apr 13, 2020
Those needs to be run on the (Vagrant Fedora 31) host
(since we need real systemd running), and so we have
to have all the tools needed to compile runc and run
the tests.

The good news is Fedora packages a decent and recent release
of bats-core (1.1.0), which we can use (Debian does not),
and we can also use golang (currently 1.13.9) from Fedora.

The bad news are

 1. Currently cgroups tests are only working with
    RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305)

 2. Tests in events.bats do not work (need cgroupv2
    memory.events support)

[v2: add -t to ssh to enforce pty]
[v3: disable events tests for cgroupv2]
[v4: update container-selinux to fix ps tests]

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Apr 14, 2020
Those needs to be run on the (Vagrant Fedora 31) host
(since we need real systemd running), and so we have
to have all the tools needed to compile runc and run
the tests.

The good news is Fedora packages a decent and recent release
of bats-core (1.1.0), which we can use (Debian does not),
and we can also use golang (currently 1.13.9) from Fedora.

The bad news are

 1. Currently cgroups tests are only working with
    RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305)

 2. Tests in events.bats do not work (need cgroupv2
    memory.events support)

 3. Fedora 31 image is 6 months old (and has broken
    container-selinux policy) so we need `dnf update`.

[v2: add -t to ssh to enforce pty]
[v3: disable events tests for cgroupv2]

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

Disabled checkpoint tests (runc currently does not support cgroup v2 c/r). All others should pass

@kolyshkin
Copy link
Contributor Author

Something ugly with CI -- logs for failed jobs are empty. Restarting

@kolyshkin kolyshkin force-pushed the fs2-init-ctrl branch 3 times, most recently from a040eda to 4fe6a28 Compare April 19, 2020 14:58
@kolyshkin kolyshkin marked this pull request as ready for review April 19, 2020 15:35
@kolyshkin
Copy link
Contributor Author

This one is ready for merge.

@crosbymichael @mrunalp @filbranden PTAL

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Apr 19, 2020

LGTM

Approved with PullApprove

Having map of per-subsystem paths in systemd unified cgroups
driver does not make sense and makes the code less readable.

To get rid of it, move the systemd v1-or-v2 init code to
libcontainer/factory_linux.go which already has a function
to deduce unified path out of paths map.

End result is much cleaner code. Besides, we no longer write pid
to the same cgroup file 7 times in Apply() like we did before.

While at it
 - add `rootless` flag which is passed on to fs2 manager
 - merge getv2Path() into GetUnifiedPath(), don't overwrite
   path if it is set during initialization (on Load).

Signed-off-by: Kir Kolyshkin <[email protected]>
It seems that some paths are coming from user and are therefore
untrusted.

Signed-off-by: Kir Kolyshkin <[email protected]>
... and its Cgroup field. There is no sense to keep it public.

This was generated by gorename.

Signed-off-by: Kir Kolyshkin <[email protected]>
The fs2 cgroup driver has a sanity check for path.
Since systemd driver is relying on the same path,
it makes sense to move this check to the common code.

Signed-off-by: Kir Kolyshkin <[email protected]>
fs2 cgroup driver was not working because it did not enable controllers
while creating cgroup directory; instead it was merely doing MkdirAll()
and gathered the list of available controllers in NewManager().

Also, cgroup should be created in Apply(), not while creating a new
manager instance.

To fix:

1. Move the createCgroupsv2Path function from systemd driver to fs2 driver,
   renaming it to CreateCgroupPath. Use in Apply() from both fs2 and
   systemd drivers.

2. Delay available controllers map initialization to until it is needed.

With this patch:
 - NewManager() only performs minimal initialization (initializin
   m.dirPath, if not provided);
 - Apply() properly creates cgroup path, enabling the controllers;
 - m.controllers is initialized lazily on demand.

Signed-off-by: Kir Kolyshkin <[email protected]>
This check was removed in commit 5406833. Now, when this
function is called from a few places, it is no longer obvious
that the path always starts with /sys/fs/cgroup/, so reinstate
the check just to be on the safe side.

This check also ensures that elements[3:] can be used safely.

Signed-off-by: Kir Kolyshkin <[email protected]>
This slightly improves code readability.

Signed-off-by: Kir Kolyshkin <[email protected]>
1. Rename the files
  - v1.go: cgroupv1 aka legacy;
  - v2.go: cgroupv2 aka unified hierarchy;
  - unsupported.go: when systemd is not available.

2. Move the code that is common between v1 and v2 to common.go

Signed-off-by: Kir Kolyshkin <[email protected]>
1. Instead of enabling all available controllers, figure out which
   ones are required, and only enable those.

2. Amend all setFoo() functions to call isFooSet(). While this might
   seem unnecessary, it might actually help to uncover a bug.
   Imagine someone:
    - adds a cgroup.Resources.CpuFoo setting;
    - modifies setCpu() to apply the new setting;
    - but forgets to amend isCpuSet() accordingly <-- BUG

   In this case, a test case modifying CpuFoo will help
   to uncover the BUG. This is the reason why it's added.

This patch *could be* amended by enabling controllers on a best-effort
basis, i.e. :

 - do not return an error early if we can't enable some controllers;
 - if we fail to enable all controllers at once (usually because one
   of them can't be enabled), try enabling them one by one.

Currently this is not implemented, and it's not clear whether this
would be a good way to go or not.

[v2: add/use is${Controller}Set() functions]
[v3: document neededControllers()]
[v4: drop "best-effort" part]

Signed-off-by: Kir Kolyshkin <[email protected]>
Run in the same environment as systemd tests.

Disable CRIU tests because:

 - they all fail with cgroup v2;

 - CRIU v3.14 is required and it's not yet released, and
   rebuilding it from sources with patches applied (like
   it is currently done in Dockerfile) is too much work.

Signed-off-by: Kir Kolyshkin <[email protected]>
1. There is no need to try removing it recursively.

2. Do not treat ENOENT as an error (similar to fs
   and systemd v1 drivers).

Signed-off-by: Kir Kolyshkin <[email protected]>
@AkihiroSuda
Copy link
Member

AkihiroSuda commented Apr 20, 2020

LGTM (still)

Approved with PullApprove

@mrunalp
Copy link
Contributor

mrunalp commented Apr 21, 2020

LGTM

Approved with PullApprove

@AkihiroSuda
Copy link
Member

This commit caused regression #2339

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants