CI: timeout in new kube tests #17076

edsantiago · 2023-01-11T15:28:45Z

PR #16880 broke CI. There's a new timeout flake, cause unknown, but it is pretty likely to be caused by 16880. @ygalblum please fix or revert ASAP. I looked more closely at the logs this morning, and it's not the known checkpoint bug, this is a new one.

As a side note, this was a bad merge. We should not merge PRs in which there are six retries on a new unexplained flake.

fedora-36 : int podman fedora-36 rootless host
- PR Introduce pkg retry logic in win installer verify task #17062
  - 01-10 19:49
fedora-36 : int remote fedora-36 root host [remote]
- PR Introduce pkg retry logic in win installer verify task #17062
  - 01-10 19:49
fedora-37 : int podman fedora-37 root container
- PR Kube Play - allow setting and overriding published host ports #16880

ygalblum · 2023-01-11T15:45:31Z

@edsantiago The retries were added following this comment #16880 (comment) from @Luap99. The reason for the retry is that podman kube play may return before the container is ready to get connections. As a result, in the CI, the test fails to connect to the port (while locally it passes all the time).

Furthermore, I don't understand why you think this change causes the test to hang. Looking at the test logs you've provided, these tests pass and most of them are not even marked as "SLOW TEST". Moreover, this PR was merged only yesterday and I've been seeing such issues with the CI before that.

I don't mind reverting the test code, but I don't understand why you think this code is the source of the issue

edsantiago · 2023-01-11T15:50:03Z

Post hoc, ergo propter hoc. Imperfect (to say the least!) reasoning, but I have a pretty good sense for CI flakes and these are new ones. They started with your PR. We are now seeing them in other PRs. There could be a common factor just slightly before your PR, in which the timeout didn't trigger, but the smart money is on it being something in your PR.

ygalblum · 2023-01-11T15:53:53Z

But, look at the commits following the merge of my PR. While they are red, they fail on other tests and the test that timeouts for me passes

edsantiago · 2023-01-11T15:59:08Z

Ah, sorry, this may be a terminology issue. A "test flake" is a condition in which a test is unreliable: it passes sometimes, fails other times, for reasons yet unknown. Test flakes are very hard to reproduce, hence hard to understand or fix.

Luap99 · 2023-01-11T16:01:02Z

All I said is that the single test should have a retry for tcp/udp connection, the full suite should never hang like that.

Luap99 · 2023-01-11T16:05:51Z

Looking at the code we most likely need to add conn.SetDeadline() otherwise a read can block forever.

ygalblum · 2023-01-11T16:23:05Z

Please see #17083

edsantiago · 2023-01-11T18:53:16Z

#17083 is merged, closing this in hopes that the problem is fixed. Thanks @ygalblum.

edsantiago added the flakes Flakes from Continuous Integration label Jan 11, 2023

edsantiago closed this as completed Jan 11, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 4, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: timeout in new kube tests #17076

CI: timeout in new kube tests #17076

edsantiago commented Jan 11, 2023

ygalblum commented Jan 11, 2023

edsantiago commented Jan 11, 2023

ygalblum commented Jan 11, 2023

edsantiago commented Jan 11, 2023

Luap99 commented Jan 11, 2023

Luap99 commented Jan 11, 2023 •

edited

Loading

ygalblum commented Jan 11, 2023

edsantiago commented Jan 11, 2023

CI: timeout in new kube tests #17076

CI: timeout in new kube tests #17076

Comments

edsantiago commented Jan 11, 2023

ygalblum commented Jan 11, 2023

edsantiago commented Jan 11, 2023

ygalblum commented Jan 11, 2023

edsantiago commented Jan 11, 2023

Luap99 commented Jan 11, 2023

Luap99 commented Jan 11, 2023 • edited Loading

ygalblum commented Jan 11, 2023

edsantiago commented Jan 11, 2023

Luap99 commented Jan 11, 2023 •

edited

Loading