Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint/Restore not working with cgroup v2 and Kubernetes #6894

Closed
adrianreber opened this issue May 8, 2023 · 3 comments
Closed

Checkpoint/Restore not working with cgroup v2 and Kubernetes #6894

adrianreber opened this issue May 8, 2023 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@adrianreber
Copy link
Member

adrianreber commented May 8, 2023

What happened?

This is mainly for tracking.

Although restoring a container in Kubernetes with cgroup v2 works the container will be immediately killed by Kubernetes as CRIU will restore the container in the old cgroup.

Outside of Kubernetes the behaviour cannot be seen as it seems only Kubernetes kills processes in unknown cgroups.

CRIU stores information about the cgroup during checkpointing and restores that information. I am not sure why this error cannot be seen with cgroup v1.

Fortunately runc has a fix for this problem in the merged PR opencontainers/runc#3546.

Unfortunately this fix has not made its way into one of the existing runc releases, yet.

@kolyshkin any ideas when the ignore setting for cgroups will appear in a runc release?

@adrianreber adrianreber added the kind/bug Categorizes issue or PR as related to a bug. label May 8, 2023
@github-actions
Copy link

github-actions bot commented Jun 8, 2023

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2023
@github-actions
Copy link

github-actions bot commented Sep 6, 2023

Closing this issue since it had no activity in the past 90 days.

@github-actions github-actions bot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 6, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 6, 2023
@WhaleSpring
Copy link

WhaleSpring commented Oct 7, 2023

Hi! Adrianreber! @adrianreber
I meet the same problem just like MaxFuhrich @MaxFuhrich , the process that creates the problem is the same and I alse refer to the blog https://martinheinz.dev/blog/85 .The process that creates the problem is the same.
The difference is that I use centos7 and cgroupv1 .
image

As a result , it doesn't seem to be problem of cgroupv1.
And I can restore a pod at one of my nodes but others not to restore with problem as follows just like MaxFuhrich:
image
And I create a issue #7349 to describe my problem before see this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

2 participants