-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestLimiter is flaky #13526
Comments
I've also seen it panic, not just fail.
|
hit this a couple times today, added logs to the description |
* Make TestLimiter test less flaky * Update lib/srv/regular/sshserver_test.go Co-authored-by: Roman Tkachenko <[email protected]> * Update lib/srv/regular/sshserver_test.go Co-authored-by: Roman Tkachenko <[email protected]> * Remove unneeded helper to wait for server to start and add a comment Co-authored-by: Roman Tkachenko <[email protected]>
Reopening this one as it's back among the top failing tests. |
Saw this fail today for failing to cleanup a temp dir. We must have some goroutine that we don't wait to finish in the test: relevant snippet:
While trying to repro locally I also noticed that this test case is creating zombie shell processes: 3 new shell processes per run. I now have 177 tty zombies. |
@GavinFrazar I've noticed the same when trying to reproduce |
I think this test caused this failure on branch/v11: https://console.cloud.google.com/cloud-build/builds/cd51f11c-8f22-4099-8e7e-01a8b013f145?project=ci-account either that or TestInactivityTimeout, it's hard to tell what caused the test suite timeout |
There seems to still be a few issues with this one:
|
I believe this is what flaked here (timeout flake) |
https://github.com/gravitational/teleport/actions/runs/4283954427/jobs/7460232370 Timeout when running |
This hasn't shown up recently, The 2 most recent occurrences were the test timeout issue that was recently fixed. I'm calling this one solved. |
The timeout is back on v13: https://github.com/gravitational/teleport/actions/runs/5467675358/jobs/9954293445 |
@rosstimothy so far what I've learned: The test times out when starting the very last session (which is supposed to fail due to the limiter):
A gouroutine dump shows we are stuck on a channel receive trying to open the There is actually a comment about not "blocking forever if the connection is rejected:
It feels like this is exactly what's happening though. The logs do show a successful authn followed by a max rate reached error:
Reproduces pretty easily locally. I'm using a lower timeout so I don't have to wait a full 10m for the test to fail:
|
We could probably simplify |
Timeout on v13, does it warrant reopening?
|
From the CI:
CI Logs:
Investigate how to remove the test's reliance on time and make it non-flaky.
The text was updated successfully, but these errors were encountered: