-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dont ignore failure to create cgroup after timeout #349
Conversation
c984793
to
8ac0138
Compare
Looks good, but can you squash the commits and write the same blurb you have in the PR description in the commit message? Thanks! |
8ac0138
to
d919943
Compare
Done! |
cgroup2/manager.go
Outdated
log.G(ctx).Warnf("Timed out while waiting for StartTransientUnit(%s) completion signal from dbus. Continuing...", group) | ||
case <-time.After(systemdStartUnitTimeout): | ||
attemptFailedUnitReset(conn, group) | ||
return fmt.Errorf("Timed out while waiting for StartTransientUnit(%s) completion signal from dbus after %v", group, systemdStartUnitTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, one nit: Make Timed lowercase to appease the Golang idiom gods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Before this commit, creating a cgroup would silently ignore timeouts and carry on. Concretely, this caused cases where a cgroup failed to create, but the caller doesn't realize and ends up looking for files that should exist (e.g. cgroups.controllers), only to find they don't exist. It's very difficult as a caller to deal with this case, where NewSystemd succeeds but the group doesn't exist. The origins of this code seem to trace back to an initial implementation written 5+ years ago: containerd@5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672 runc added roughly the same logic here to deal with the same issue: opencontainers/runc#3782 Now, containerd will also error if a cgroup cannot be created within the timeout window. Signed-off-by: Josh Chorlton <[email protected]>
d919943
to
2e25118
Compare
@dcantah what's the path to getting this merged? |
@jchorl Just needs another approval, I'll shop this around to folks |
I noticed that creating a cgroup will silently ignore timeouts and continue on. Concretely, I've hit cases where a cgroup fails to get created, and the caller ends up looking for
cgroups.controllers
only to find it isn't there.It's very difficult as a caller to deal with this case, where
NewSystemd
succeeds but the group doesn't exist.I traced the origins of this code to 5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672 - which was written 5+ years ago.
runc
added roughly the same logic here: opencontainers/runc#3782