-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New networking flake (root, compose) #11123
Comments
Just to be clear, this is not new. You have linked these flakes also in #10052, example log. My PR was intended to only fix the rootless issue. This issue also happens in the rootful remote system tests, see #10052 (comment) |
@edsantiago I will take a look today. It looks like the |
OK I was able to reproduce this once in a VM. The network configuration looked fine so I have no idea what is causing this. The only suspicious thing are these kernel messages:
|
Martian seems to imply the host doesn't know about the subnet. Maybe the interface on the host into the bridge network didn't properly pick up an IP address? |
I checked the interface ips and iptables and everything looked normal. |
Routing table, maybe? It shouldn't be required for directly-attached subnets, but I'm running out of ideas. |
If I get it to reproduce again I can check. Somehow it runs for hours without failure now. |
OK I got the root problem. The cni-podman interface is down because the veth interface is not attached to it.
I don't know why but I think it has something to the with the NetworkManager. This is what I see in the journal the failed run:
This is what I see for a successful run:
The offending line is Here is the full journal in case someone wants to see this net-flake-journal.txt |
I am now 100% sure that this is NetworkManager doing weird things, I added a NetworkManager config to exclude the cni/veth interfaces:
It is now running over 4 hours without failure. |
Alright... I wonder if this is something we have to worry about on user systems. We obviously can't be as general (excluding veth* on a user system won't fly)... Probably not worth it until we get a user report of this, though. |
Maybe this is something special in the GCOULD VM setup. I was not able to reproduce this locally or 1minutetip. |
These images contain a workaround for: containers/podman#11123 Ref: containers/podman#11070 containers/automation_images#88 Signed-off-by: Chris Evich <[email protected]>
These images contain a workaround for: containers/podman#11123 Prior-Ubuntu support is being dropped everywhere. Ref: containers/podman#11070 containers/automation_images#88 Signed-off-by: Chris Evich <[email protected]>
Mirror of changes from containers/skopeo#1444 These images contain a workaround for: containers/podman#11123 Prior-Ubuntu support is being dropped everywhere. Ref: containers/podman#11070 containers/automation_images#88 Signed-off-by: Chris Evich <[email protected]>
These images contain a workaround for: containers/podman#11123 Ref: containers/podman#11070 containers/automation_images#88 Signed-off-by: Chris Evich <[email protected]>
These images contain a workaround for: containers/podman#11123 Ref: containers/podman#11070 containers/automation_images#88 Signed-off-by: Chris Evich <[email protected]>
These images contain a workaround for: containers/podman#11123 Ref: containers/podman#11070 containers/automation_images#88 Signed-off-by: Chris Evich <[email protected]>
New flake. Looks identical to #10052, except this is root. It is also happening on my new PR, which sits on top of #11091.
The text was updated successfully, but these errors were encountered: