-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: timeout in new kube tests #17076
Comments
@edsantiago The retries were added following this comment #16880 (comment) from @Luap99. The reason for the retry is that Furthermore, I don't understand why you think this change causes the test to hang. Looking at the test logs you've provided, these tests pass and most of them are not even marked as "SLOW TEST". Moreover, this PR was merged only yesterday and I've been seeing such issues with the CI before that. I don't mind reverting the test code, but I don't understand why you think this code is the source of the issue |
Post hoc, ergo propter hoc. Imperfect (to say the least!) reasoning, but I have a pretty good sense for CI flakes and these are new ones. They started with your PR. We are now seeing them in other PRs. There could be a common factor just slightly before your PR, in which the timeout didn't trigger, but the smart money is on it being something in your PR. |
But, look at the commits following the merge of my PR. While they are red, they fail on other tests and the test that timeouts for me passes |
Ah, sorry, this may be a terminology issue. A "test flake" is a condition in which a test is unreliable: it passes sometimes, fails other times, for reasons yet unknown. Test flakes are very hard to reproduce, hence hard to understand or fix. |
All I said is that the single test should have a retry for tcp/udp connection, the full suite should never hang like that. |
Looking at the code we most likely need to add |
Please see #17083 |
PR #16880 broke CI. There's a new timeout flake, cause unknown, but it is pretty likely to be caused by 16880. @ygalblum please fix or revert ASAP. I looked more closely at the logs this morning, and it's not the known checkpoint bug, this is a new one.
As a side note, this was a bad merge. We should not merge PRs in which there are six retries on a new unexplained flake.
The text was updated successfully, but these errors were encountered: