Refactor youki delete to match runc/crun. #1654

yihuaf · 2023-03-10T02:59:20Z

The discrepancy in deleting a created status container without force flag is first brought up in #1556. See the PR for detailed discussion. In this PR, we fix the discrepancy and refactored the delete code to be more readable. In this way, it is clear that we first check the condition if container can be deleted. Then perform the delete. We also add a debug assert to make the the invariant in not broken when deleting.

codecov-commenter · 2023-03-10T03:03:29Z

Codecov Report

Merging #1654 (70a5712) into main (6f30e20) will decrease coverage by 0.09%.
The diff coverage is 0.00%.

❗ Current head 70a5712 differs from pull request most recent head 882fd8d. Consider uploading reports for the commit 882fd8d to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1654      +/-   ##
==========================================
- Coverage   68.99%   68.91%   -0.09%     
==========================================
  Files         120      120              
  Lines       13155    13166      +11     
==========================================
- Hits         9076     9073       -3     
- Misses       4079     4093      +14

YJDoc2 · 2023-03-10T05:32:22Z

Hey, I'll take a loot at this later today 👍

yihuaf · 2023-03-10T05:33:14Z

Hey, I'll take a loot at this later today 👍

Thank you.

lengrongfu · 2023-03-10T05:35:54Z

Looks good to me. 💯

Refactored the delete to match runc and crun. Now youki allow deletion of created containers. Signed-off-by: Eric Fang <[email protected]>

Signed-off-by: Eric Fang <[email protected]>

YJDoc2

Hey, I have left a couple of comments, but overall, looks great! Really nice commenting 👍

YJDoc2 · 2023-03-10T16:05:00Z

crates/libcontainer/src/container/container_delete.rs

+                self.do_kill(signal::Signal::SIGKILL, false)?;
+                self.set_status(ContainerStatus::Stopped).save()?;
+            }
+            ContainerStatus::Creating | ContainerStatus::Running | ContainerStatus::Paused => {


Should the Creating state also handled along with Created ? If I'm correct, then even at the time of Creating state, there should be only one process in the container's cgroup. If that is correct, then we don't get any advantage / have any reason to use kill all with force. I checked the spec, and it specifies that states other than stopped should generate error, but we are going according to runc anyways.

Please correct me If my understanding was wrong.

Technically 2 process, container init and intermediate process before it exits. With that being said, it is possible I am over optimizing in this case. The goal here is to kill all the process in the container cgroup, so I should possibly just call kill all here for either cases.

Hey, If there are two processes, then this logic makes perfect sense.
Also, I don't think we should merge Created with rest 3 (if that was what you meant) , because as per runc and crun, we need to allow deleting Created without force flag, however need force flag for others.

If there are going to be (even technically) two processes, in Creating state, it is better to handle it by killing all in cgroup, just to be safe.

Also, I don't think we should merge Created with rest 3

No we should not. I wanted a single match statement for different cases to make the code more readable. In this way, we don't have to mess with if statement on status + force flag.

If there are going to be (even technically) two processes, in Creating state, it is better to handle it by killing all in cgroup, just to be safe.

This is what I believe as well.

YJDoc2 · 2023-03-10T16:07:21Z

crates/libcontainer/src/container/container_delete.rs

+                Err(err) => {
+                    // There is a brief window where the container state is
+                    // created, but the container config is not yet generated
+                    // from the OCI spec. In this case, we assume as if we
+                    // successfully deleted the config and moving on.
+                    log::warn!("skipping loading youki config due to: {err:?}, continue to delete");
+                }


Do we need to worry about race condition? For eg, if we are deleting a container that is being created, then there might be a window where the config is not generated, and we are treating it as deleted. If that does get created after this moment, then should we be concerned about it or not?

If we are treating the delete root after this as catch all, then I guess this is fine

You asked a very good question and here is what I think. First of all, the actual bug is addressed in (#1649) and this PR only adds a warning and an explanation, so the context of the bug is in #1649 and #1640. In short, if rootless::New errors for whatever reason, we are stuck in a state where container state is created but not the youki config. This will cause the delete to not able to clean up because it is missing the youki config.

Here is the complex part about the race condition and you are right that there is a potential race here. If process 1 is calling youki create and process 2 is calling youki delete --force, we have a window where process 1 has not launched the process yet, but process 2 is going ahead to delete. There is no way for process 2 to signal process 1 to stop creating the container processes with force flag. What will happen is that process 2 will delete the container root and process 1 will continue to start the process wand wait on the exec.fifo, which is a file (named pipe). So when container root is deleted, the exec.fifo will be gone and the process waiting on it will error out. Therefore, both should be clean up correctly. Or process 1 at any point trying to save the state and fails. So in theory, the race condition will not cause leakage in terms of process or files on the system. I believe this the best we can guarantee. I think this covers all the situations, but please double check my work here.

Hey, I think this makes sense 👍
I think this should solve the issue and not cause any leakage of resources.
If possible, give me a day's time, I'll try to do a manual verification check by launching parallel processes and checking that resources are cleaned up.

In either case, I think this implementation is good enough :)

Hey, I think this makes sense 👍 I think this should solve the issue and not cause any leakage of resources. If possible, give me a day's time, I'll try to do a manual verification check by launching parallel processes and checking that resources are cleaned up.

In either case, I think this implementation is good enough :)

Take you time please :) I am curious as well.

Hey @yihuaf Unfortunately, I didn't get time in last week to do this 😓 I'll try to get this done soon 🙏

Another option is to merge this PR as is, since the PR is fixing a specific issue. We can continue to test the edge case on main and file issue as needed.

Signed-off-by: Eric Fang <[email protected]>

YJDoc2

LGTM 👍

Thanks for the changes, and apologies on my behalf for delay 😅

yihuaf mentioned this pull request Mar 10, 2023

improving delete condition check #1556

Closed

yihuaf self-assigned this Mar 10, 2023

yihuaf requested review from Furisto, utam0k and YJDoc2 March 10, 2023 05:28

yihuaf added 2 commits March 10, 2023 06:40

refactored delete

44e296b

Refactored the delete to match runc and crun. Now youki allow deletion of created containers. Signed-off-by: Eric Fang <[email protected]>

Add a bit of comment on when failed to load config during delete

9cf049f

Signed-off-by: Eric Fang <[email protected]>

yihuaf force-pushed the yihuaf/delete branch from 2c501d8 to 9cf049f Compare March 10, 2023 05:53

fix

03bb81d

Signed-off-by: Eric Fang <[email protected]>

YJDoc2 reviewed Mar 10, 2023

View reviewed changes

yihuaf force-pushed the yihuaf/delete branch from e317b62 to 70a5712 Compare March 11, 2023 03:25

Kill all process for created and creating

882fd8d

Signed-off-by: Eric Fang <[email protected]>

yihuaf force-pushed the yihuaf/delete branch from 70a5712 to 882fd8d Compare March 11, 2023 03:25

yihuaf requested a review from YJDoc2 March 11, 2023 04:29

YJDoc2 approved these changes Mar 22, 2023

View reviewed changes

YJDoc2 merged commit d060934 into youki-dev:main Mar 22, 2023

yihuaf deleted the yihuaf/delete branch March 23, 2023 06:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor youki delete to match runc/crun. #1654

Refactor youki delete to match runc/crun. #1654

yihuaf commented Mar 10, 2023

codecov-commenter commented Mar 10, 2023 •

edited

Loading

YJDoc2 commented Mar 10, 2023

yihuaf commented Mar 10, 2023

lengrongfu commented Mar 10, 2023

YJDoc2 left a comment

YJDoc2 Mar 10, 2023

yihuaf Mar 11, 2023

YJDoc2 Mar 13, 2023

yihuaf Mar 13, 2023

YJDoc2 Mar 10, 2023

yihuaf Mar 11, 2023

YJDoc2 Mar 13, 2023

yihuaf Mar 13, 2023

YJDoc2 Mar 20, 2023

yihuaf Mar 21, 2023

YJDoc2 left a comment

Refactor youki delete to match runc/crun. #1654

Refactor youki delete to match runc/crun. #1654

Conversation

yihuaf commented Mar 10, 2023

codecov-commenter commented Mar 10, 2023 • edited Loading

Codecov Report

YJDoc2 commented Mar 10, 2023

yihuaf commented Mar 10, 2023

lengrongfu commented Mar 10, 2023

YJDoc2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YJDoc2 left a comment

Choose a reason for hiding this comment

codecov-commenter commented Mar 10, 2023 •

edited

Loading