-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e authenticated push test: multiple failures #18355
Comments
Oh, I wonder if this is #18286 ? |
This ([f38 root remote]](https://api.cirrus-ci.com/v1/artifact/task/5271682227634176/html/int-remote-fedora-38-root-host-boltdb.log.html#t--authenticated-push--1)) is in the same
That's it. No more output or error, just that. Test fails with exit status 125. |
Here's the list so far:
Edited issue title, because this is not rawhide-only (also seeing in f38). podman and podman-remote. root and rootless. |
Keeps happening, and here's a weird variation (f38 remote root):
|
Another weird variation, I'm not even sure if this is the same bug (rawhide rootless):
Here the tls handshake is against |
Another one of the http/https flakes, f38 root:
Here's the journal log from the registry container:
Last few days' worth:
|
Still seeing "TLS handshake timeout", but the most common failure I'm seeing these days is:
|
First: fix podman-registry script so it preserves the initial $PODMAN, so all subsequent invocations of ps, logs, and stop will use the same binary and arguments. Until now we've handled this by requiring that our caller manage $PODMAN (and keep it the same), but that's just wrong. Next, simplify the golang interface: move the $PODMAN setting into registry.go, instead of requiring e2e callers to set it. (This could use some work: the local/remote conditional is icky). IMPORTANT: To prevent registry.go from using the wrong podman binary, the Start() call is gone. Only StartWithOptions() is valid now. And, minor cleanup: comments, and add an actual error-message check Reason for this PR is a recurring flake, containers#18355, whose multiple failure modes I truly can't understand. I don't think this PR is going to fix it, but this is still necessary work. Signed-off-by: Ed Santiago <[email protected]>
This is a nasty flake, triggering often in my no-retries PR. There are different failure modes, but my conclusion is that this is a test conflict. Some other test is clobbering the network while this test is running, and that makes the test fail. The common sequence is:
For example, this test log. Look at the associated journal. Search in-page for I haven't looked at every single test failure... just, enough of them to convince me that this is the cause of the flake. Obvious fix is to make this test
|
A friendly reminder that this issue had no activity for 30 days. |
Still happening; reduced rate is because I'm not running #17831 as frequently. This is almost certainly happening in real CI, but passing on the ginkgo retry. I spent a good part of a day a few weeks ago trying to get a reproducer. No luck. This one is tricky.
|
@edsantiago shall we try updating quay.io/libpod/registry:2.8 to 2.8.2? This could be a server-side fart. |
Why not? I'll try that in #17831 for a few weeks, then report back. Thanks for the suggestion. |
...in hopes of addressing flake containers#18355 Signed-off-by: Ed Santiago <[email protected]>
Have not seen this since bumping |
Oh nice! I happily take the blame and declare this flake fixed 😈 |
Just one instance, but right now I'm filing issues for anything unusual on rawhide:
Unfortunately, https://github.com/containers/podman/blob/main/hack/podman-registry-go/registry.go does not log anything. (Maybe a good idea to instrument it?) Here is the journal log, test begins at 19:27:43. I don't see anything odd, but I only skimmed and I don't really know what to look for.
The text was updated successfully, but these errors were encountered: