-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compose flake: rootlesskit port forwarder not setup correctly #10052
Comments
@AkihiroSuda PTAL |
Is this reproducible manually out of CI? Is this specific to compose? |
I am unable to reproduce locally and I also did no see such issue in other CI tests so it could be specific to compose. |
Hmmmm, it's not only rootless: compose: simple_port_map - curl (port 5000) failed with status 7
compose: port_map_diff_port - curl (port 5001) failed with status 7
compose: mount_and_label : port 5000
|
This is now our #1 flake. Unfortunately it triggers in many different subtests, so it's hard to just add a compose: mount_and_label - curl (port 5000) failed with status 7
compose: env_and_volume - down
compose: port_map_diff_port - curl (port 5001) failed with status 7
compose: env_and_volume - curl (port 5000) failed with status 7
compose: env_and_volume : port 5001
|
@edsantiago Can you check for network flakes in the system tests, especially the remote tests. I think this is a more general problem, not only compose. |
@Luap99, interesting that you mention it. Here are two flakes from the past week: sys: podman networking: port on localhost
The symptom looks similar to my eye. Aside from those two, I don't see any flakes that match. (I do see a new series of iptables flakes, but I'm going to ignore those right now unless they start happening more often). [EDIT: I just noticed, these are podman, not podman-remote like you asked for] |
Another flake in "port on localhost" test just now: PR #10222, sys podman ubuntu-2104 rootless host (again podman, not remote) |
@edsantiago These are new ones from my PR yesterday, they should be fixed by #10318. I mean failures like this one https://storage.googleapis.com/cirrus-ci-6707778565701632-fcae48/artifacts/containers/podman/4569209960136704/html/sys-remote-fedora-34-root-host.log.html. The |
Unfortunately, no, I have no (sane) way to grep the logs. I just have a very simple script that queries github and cirrus but does not download&preserve the logs for me. I guess I should consider adding that... |
Here's another one outside of compose tests. podman + root, not podman-remote rootless: sys: podman network reload
|
compose: simple_port_map - curl (port 5000) failed with status 7
These too are network-related, one a missing result from a web fetch, another a `ncat: no route to host' error: sys: podman networking: port on localhost
sys: podman pod create - hashtag AllTheOptions
|
Another sys: podman networking: port with --userns=keep-id
And another compose flake: compose: mount_and_label - curl (port 5000) failed with status 7
|
I also see the compose tests flaking frequently. |
Is this where we're tracking sys: podman networking: port with --userns=keep-id
|
OK, I found the root cause for the rootless issue. Reproducer:
docker-compose always runs network disconnect && network connect on the container. I think the race happens between the rootlesskit port setup and docker-compose calling network connect/disconnect at the same time. When rootlesskit was initialized before network connect is finished, port forwarding will be broken. Rootlesskit port forwarding is broken is because podman sets the child ip for the rootlesskit port forwarder to the eth0 ip address. After disconnect && connect CNI will allocate a new ip. Since the new eth0 ip no longer matches the rootless kit child ip the port forwarding is broken. Ideally we could use 127.0.0.1 as source address but this was changed to fix CVE-2021-20199. @AkihiroSuda Do you know a good way to fix this? I know rootlesskit offers a way to add/remove ports dynamically, is there way for podman network connect to remove the broken port and add a new port with the correct child ip? |
The rootlessport forwarder requires a child IP to be set. This must be a valid ip in the container network namespace. The problem is that after a network disconnect and connect the eth0 ip changed. Therefore the packages are dropped since the source ip does no longer exists in the netns. One solution is to set the child IP to 127.0.0.1, however this is a security problem. [1] To fix this we have to recreate the ports after network connect and disconnect. To make this work the rootlessport process exposes a socket where podman network connect/disconnect connect to and send to new child IP to rootlessport. The rootlessport process will remove all ports and recreate them with the new correct child IP. Also bump rootlesskit to v0.14.3 to fix a race with RemovePort(). Fixes containers#10052 [1] https://nvd.nist.gov/vuln/detail/CVE-2021-20199 Signed-off-by: Paul Holzinger <[email protected]>
The rootlessport forwarder requires a child IP to be set. This must be a valid ip in the container network namespace. The problem is that after a network disconnect and connect the eth0 ip changed. Therefore the packages are dropped since the source ip does no longer exists in the netns. One solution is to set the child IP to 127.0.0.1, however this is a security problem. [1] To fix this we have to recreate the ports after network connect and disconnect. To make this work the rootlessport process exposes a socket where podman network connect/disconnect connect to and send to new child IP to rootlessport. The rootlessport process will remove all ports and recreate them with the new correct child IP. Also bump rootlesskit to v0.14.3 to fix a race with RemovePort(). Fixes containers#10052 [1] https://nvd.nist.gov/vuln/detail/CVE-2021-20199 Signed-off-by: Paul Holzinger <[email protected]>
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
The rootless compose test flaking regularly in CI. The problem is that the port forwarding with rootlesskit is not working. Rootlesskit does not seem to bind the port on the host.
A example test failure with somw debug information can be seen here: https://storage.googleapis.com/cirrus-ci-6707778565701632-fcae48/artifacts/containers/podman/5302424773591040/html/compose-podman-fedora-34beta-rootless-host.log.html
Relevant log lines:
A successful run looks like this:
Relevant code part:
podman/pkg/rootlessport/rootlessport_linux.go
Lines 229 to 238 in 4c88035
We never reach
logrus.Info("ready")
so the error must be happening insideexposePorts()
. However the main podman process never seems to get the error propagated back and thinks rootlesskit started successfully.The text was updated successfully, but these errors were encountered: