-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libct/cg: fix an error of cgroup path removal #4520
Conversation
36d06e4
to
fa15928
Compare
@AkihiroSuda Could you please help to see this patch has fixed the issue #4518 or not? |
Sorry, I'm on trip to kubecon, can't test this PR right now cc @samiam (the original reporter of the issue) |
if os.IsNotExist(err) { | ||
// Please keep this error eraser, or else it will return ErrNotExist | ||
// for cgroupv2. | ||
// Please see https://github.com/opencontainers/runc/issues/4518 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to have a bats test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it’s hard to simulate the similar error.
But maybe we can split this func, and add a go unit test. I don’t know it’s worth to do or not. But we should verify this patch is correct or not at first.
It appears to work via buildkit. But I'm new to buildkit, so my testing could be flawed. Buildkit changes include changing the Dockerfile and the name of the image to ensure I was using my local one.
Build local images:
Run the tests again:
Who knew removing a directory could be so complicated? 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so complicated, because cgroupfs is a pseudo-filesystem, not a real physical fs. Thanks for your check. |
libcontainer/cgroups/utils.go
Outdated
// Please keep this error eraser, or else it will return ErrNotExist | ||
// for cgroupv2. | ||
// Please see https://github.com/opencontainers/runc/issues/4518 | ||
err = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return nil
here? Resetting the err
combined with the comment made me go look if the err
was used in a defer
(or if there was something special); returning nil
here makes it more clear it's a regular "we can ignore this".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that matter, we should do the same below in the infos
loop, and also return early if there's an error;
for _, info := range infos {
if info.IsDir() {
// We should remove subcgroup first.
if err := RemovePath(filepath.Join(path, info.Name())); err != nil {
return err
}
}
}
That would allow use to remove the if err == nil
check outside of the loop, and just;
return rmdir(path, true)
❓ Question though; should the infos
loop also ignore os.IsNotExist()
?
for _, info := range infos {
if !info.IsDir() {
continue
}
// We should remove subcgroup first.
if err := RemovePath(filepath.Join(path, info.Name())); err != nil && !os.IsNotExist(err) {
return err
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question though; should the
infos
loop also ignoreos.IsNotExist()
?
I think there is no chance we will get ErrNotExist error in here.
On that matter, we should do the same below in the
infos
loop, and also return early if there's an error;
👍
80008d7
to
af8c523
Compare
|
And the error from Indeed, /sys/fs/cgroup is mounted read-only in the container in question:
I'm not quite sure how the cgroup directory is created in the first place in such case, but it's definitely there. If someone can shed some light that would be great. My guess is we need some special handing for such case. It's read only meaning we can't remove it no matter what. But it can be gone if there are no processes, so we can wait. (the readdir logic is obviously wrong and need to be fixed nevertheless) |
How does runc write the init process pid to a ro cgroup? |
Yes, it indeed retry for any errors(except ErrNotExist) in ‘release-1.1’, how about sleep and retry in rmdir for all errors except ENOENT? @kolyshkin |
Some of my analysis above are wrong. But let's continue the discussion in the issue (#4518). |
af8c523
to
d5497de
Compare
When fall back to the traditional path walk removal after rmdir, there is an error if the path suddenly gone, we should ignore this ErrNotExist error when we open the cgroup path. Signed-off-by: lfbzhm <[email protected]>
Co-authored-by: Sebastiaan van Stijn <[email protected]> Signed-off-by: lifubang <[email protected]>
If we remove a non-exist dir in a ro mount point, it will return EROFS in `unix.Rmdir`, so we need to check first. Test step: ```bash root@acmubuntu:/opt/bb# mkdir from to root@acmubuntu:/opt/bb# touch from/test root@acmubuntu:/opt/bb# mount -o bind,ro from to root@acmubuntu:/opt/bb# ls to test root@acmubuntu:/opt/bb# touch to/test1 touch: cannot touch 'to/test1': Read-only file system root@acmubuntu:/opt/bb# mkdir to/test1 mkdir: cannot create directory ‘to/test1’: Read-only file system root@acmubuntu:/opt/bb# ls to/test1 ls: cannot access 'to/test1': No such file or directory root@acmubuntu:/opt/bb# rmdir to/test1 rmdir: failed to remove 'to/test1': Read-only file system root@acmubuntu:/opt/bb# ``` Signed-off-by: lifubang <[email protected]>
d5497de
to
c8ce25e
Compare
For cgroup path removal, when we try a fast way to remove the cgroup
path, there may be a small possibility race to return an error, though
the path was removed successfully, for example: EINTR, or other errors.
Then we will fall back to the traditional path walk removal, but there
is a regression if the path has been sucessfully removed, which was
introduced by d3d7f7d in #4102. We should erase the ErrNotExist error
when we open the cgroup path.
Fix: #4518