-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky TestGameServerUnhealthyAfterReadyCrash #2302
Flaky TestGameServerUnhealthyAfterReadyCrash #2302
Conversation
I noticed in a few flaky end-to-end tests, I kept seeing this in the logs: ``` time="2021-10-08 19:02:01.419" level=info msg="sent UDP packet" address="35.247.94.25:7682" test=TestGameServerUnhealthyAfterReadyCrash ``` Over and over again, and also noting that it was happening _after_ the e2e test had completed. See: https://console.cloud.google.com/cloud-build/builds/9ca5715a-443c-4693-bbd5-2879e61f2aaa;step=21?project=agones-images https://console.cloud.google.com/cloud-build/builds/84cb8db2-1a11-4db3-a9e1-d9d51b9baf14;step=21?project=agones-images My theory: The go routine had nothing in it that forced it to stop once the test was complete - so depending on order of tests, it might keep continuing for a while, while other tests ran. If it did that, and a GameServer spun up on the same node and port as the originally crashed GameServer, it would crash it - likely breaking whatever test it ran into! Work on googleforgames#2296
Build Succeeded 👏 Build Id: e1561b30-42bd-4777-8d55-e728e03d3295 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
@@ -385,8 +386,16 @@ func TestGameServerUnhealthyAfterReadyCrash(t *testing.T) { | |||
|
|||
// keep crashing, until we move to Unhealthy. Solves potential issues with controller Informer cache | |||
// race conditions in which it has yet to see a GameServer is Ready before the crash. | |||
var stop int32 = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since 0 is the default value for int32, I think that you can leave it off here: var stop int32
@@ -385,8 +386,16 @@ func TestGameServerUnhealthyAfterReadyCrash(t *testing.T) { | |||
|
|||
// keep crashing, until we move to Unhealthy. Solves potential issues with controller Informer cache | |||
// race conditions in which it has yet to see a GameServer is Ready before the crash. | |||
var stop int32 = 0 | |||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why you don't use a channel here, or a cancelable context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured if I did that then I had to include a select
statement, which seemed more complicated than an atomic variable.
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: markmandel, roberthbailey The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Build Failed 😱 Build Id: d9f77790-2ff0-4775-b1da-279f2ede61e7 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
The fix for fixing a flaky test failed with a test flake. :/ |
Build Succeeded 👏 Build Id: ae3cf33b-c411-47a5-a5f6-38bae4456042 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
What type of PR is this?
/kind cleanup
What this PR does / Why we need it:
I noticed in a few flaky end-to-end tests, I kept seeing this in the
logs:
Over and over again, and also noting that it was happening after the e2e test had completed.
See:
https://console.cloud.google.com/cloud-build/builds/9ca5715a-443c-4693-bbd5-2879e61f2aaa;step=21?project=agones-images
https://console.cloud.google.com/cloud-build/builds/84cb8db2-1a11-4db3-a9e1-d9d51b9baf14;step=21?project=agones-images
My theory: The go routine had nothing in it that forced it to stop once the test was complete - so depending on order of tests, it might keep continuing for a while, while other tests ran.
If it did that, and a GameServer spun up on the same node and port as the originally crashed GameServer, it would crash it - likely breaking whatever test it ran into!
Which issue(s) this PR fixes:
Work on #2296
Special notes for your reviewer:
N/A