-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
youki need seccomp unconfined when runc/crun don't #2022
Comments
👋 Hi, @martadinata666. Thanks for your report. May I ask you to give us more specific commands to reproduce the problem you pointed out? |
Hi, thanks for the response. Let me start with minimal compose startup
In this setup that docker by default using
In this compose I define runtime with youki, that should be start container correctly, but unfortunately it don't, like the original post.
So I'm looking around and find
But just out of curiosity I follow the guide, updating my compose to
And unexpectedly it starts correctly, so I don't know why youki need seccomp tweak when other runtimes doesn't. FYI |
OK, so here is a preliminary investigation. First of all, we need to update the OCI spec crate to the latest. @utam0k We may need your help to cut a new release for the crate. Specifically, we need this PR: The docker default seccomp profile has a I can confirm this is the cause because I made it working by directly overriding all errorRet to ENOSYS instead of EPERM. This is just to verify that this is indeed the issue. The proper fix should come from fixing the oci spec crate. Now, a little more into this rabbit hole that is semi-related to this issue. Due to the nature of libseccomp, to properly secure the sandbox, we have to use a whitelist approach. In another word, dockerd's default seccomp profile will be deny all syscalls and enumerate the allowed syscalls. And currently, the default errno is EPERM. This is required because in the future, we want the same profile to work when new syscall is introduced. Otherwise, the new syscalls can potentially escape the policy. Therefore, docker hardcode a whitelist in its codebase. However, this can become a problem in some cases. For example,
The real fix should be inside libseccomp, but the issue has been pending for a while and likely will not be fixed soon:
An alternative solution discussed is to just make all unknown syscall to ENOSYS. Here is the discussion from The same proposal is passed to With all of these being said, potentially Some other reference/readings if we want to go down this rabbit hole with me lol. Reference: https://medium.com/nttlabs/ubuntu-21-10-and-fedora-35-do-not-work-on-docker-20-10-9-1cd439d9921 I am brain dumping all these info here before I loose all these details in my head. The short term fix is update the oci-spec-rs crate. I want to sleep on the long term issue with regarding to ENOSYS for unknown syscall. |
Interestingly, |
As the title said, for some reason youki need
seccomp unconfined
, two containers that I tested was mariadb and jellyfin.Jellyfin log
Mariadb log
Runtime:
Kernel 6.3
Ubuntu Jammy
Docker 24.0.2
Compiler
Rust 1.70
The text was updated successfully, but these errors were encountered: