-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e create same-IP: try to fix flake #18329
e2e create same-IP: try to fix flake #18329
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: edsantiago The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Our friend containers#7096 is still not fixed: it continues to flake, singletons only, and only in the "create" test (not "run"). My guess: maybe there's a race somewhere in IP assignment, such that container1 can have an IP, but not yet be running, and a container2 can sneak in and start with that IP, and container1 is the one that fails? Solution: tighten the logic so we wait for container1 to truly be running before we start container2. And, when we start container2, do so with -a so we get to see stdout. (Am not expecting it to be helpful, but who knows). Also very minor cleanup Signed-off-by: Ed Santiago <[email protected]>
ba2498d
to
ae5ed6d
Compare
LGTM |
Recent failures:
All of them in my no-flake-retries PR, which means this is almost certainly happening in production CI, but my flake logger is not seeing those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea!
/lgtm
I don't think this is possible, start should block until the container is started which means it also has to wait for the complete network setup. The whole loop here wait for the ip address to be assigned makes no sense to me. If this is needed podman start is seriously broken. |
@Luap99 if what you say is correct, the test will continue to flake. Here's a question I've long wondered about: does ginkgo have a mechanism for running code on failure? Like: Expect("this").ButIfItFailsThen("podman exec container1 ip a or podman logs or something") |
So I assume you want to execute further commands to debug in this case? I don't think this is directly supported. However what could work is using the extra annotations and pass a function that executes the commands. |
Well, phooey |
Not sure if I should be happy that I was right or hate the fact that something super strange is going on, maybe both? One thing that could cause the behavior is when the first container dies after inspect but before the start test2 call. Not that I belive that this is the case but you are right we need to instrument the tests to gather more data after the failure. Please reopen the issue I can take a look tomorrow. |
Our friend #7096 is still not fixed: it continues to flake,
singletons only, and only in the "create" test (not "run").
My guess: maybe there's a race somewhere in IP assignment,
such that container1 can have an IP, but not yet be running,
and a container2 can sneak in and start with that IP, and
container1 is the one that fails?
Solution: tighten the logic so we wait for container1 to
truly be running before we start container2. And, when we
start container2, do so with -a so we get to see stdout.
(Am not expecting it to be helpful, but who knows).
Also very minor cleanup
Signed-off-by: Ed Santiago [email protected]