Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky TestGameServerUnhealthyAfterReadyCrash #2302

Conversation

markmandel
Copy link
Member

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug

/kind cleanup

/kind documentation
/kind feature
/kind hotfix

What this PR does / Why we need it:

I noticed in a few flaky end-to-end tests, I kept seeing this in the
logs:

time="2021-10-08 19:02:01.419" level=info msg="sent UDP packet" address="35.247.94.25:7682" test=TestGameServerUnhealthyAfterReadyCrash

Over and over again, and also noting that it was happening after the e2e test had completed.

See:
https://console.cloud.google.com/cloud-build/builds/9ca5715a-443c-4693-bbd5-2879e61f2aaa;step=21?project=agones-images
https://console.cloud.google.com/cloud-build/builds/84cb8db2-1a11-4db3-a9e1-d9d51b9baf14;step=21?project=agones-images

My theory: The go routine had nothing in it that forced it to stop once the test was complete - so depending on order of tests, it might keep continuing for a while, while other tests ran.

If it did that, and a GameServer spun up on the same node and port as the originally crashed GameServer, it would crash it - likely breaking whatever test it ran into!

Which issue(s) this PR fixes:

Work on #2296

Special notes for your reviewer:

N/A

I noticed in a few flaky end-to-end tests, I kept seeing this in the
logs:

```
time="2021-10-08 19:02:01.419" level=info msg="sent UDP packet" address="35.247.94.25:7682" test=TestGameServerUnhealthyAfterReadyCrash
```

Over and over again, and also noting that it was happening _after_ the
e2e test had completed.

See:
https://console.cloud.google.com/cloud-build/builds/9ca5715a-443c-4693-bbd5-2879e61f2aaa;step=21?project=agones-images
https://console.cloud.google.com/cloud-build/builds/84cb8db2-1a11-4db3-a9e1-d9d51b9baf14;step=21?project=agones-images

My theory: The go routine had nothing in it that forced it to stop once
the test was complete - so depending on order of tests, it might keep
continuing for a while, while other tests ran.

If it did that, and a GameServer spun up on the same node and port as
the originally crashed GameServer, it would crash it - likely breaking
whatever test it ran into!

Work on googleforgames#2296
@google-cla google-cla bot added the cla: yes label Oct 8, 2021
@markmandel markmandel added the area/tests Unit tests, e2e tests, anything to make sure things don't break label Oct 8, 2021
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: e1561b30-42bd-4777-8d55-e728e03d3295

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2302/head:pr_2302 && git checkout pr_2302
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.18.0-f4ddaf4

@@ -385,8 +386,16 @@ func TestGameServerUnhealthyAfterReadyCrash(t *testing.T) {

// keep crashing, until we move to Unhealthy. Solves potential issues with controller Informer cache
// race conditions in which it has yet to see a GameServer is Ready before the crash.
var stop int32 = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 0 is the default value for int32, I think that you can leave it off here: var stop int32

@@ -385,8 +386,16 @@ func TestGameServerUnhealthyAfterReadyCrash(t *testing.T) {

// keep crashing, until we move to Unhealthy. Solves potential issues with controller Informer cache
// race conditions in which it has yet to see a GameServer is Ready before the crash.
var stop int32 = 0
defer func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why you don't use a channel here, or a cancelable context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured if I did that then I had to include a select statement, which seemed more complicated than an atomic variable.

@google-oss-robot
Copy link

New changes are detected. LGTM label has been removed.

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markmandel, roberthbailey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [markmandel,roberthbailey]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: d9f77790-2ff0-4775-b1da-279f2ede61e7

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@roberthbailey
Copy link
Member

The fix for fixing a flaky test failed with a test flake. :/

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: ae3cf33b-c411-47a5-a5f6-38bae4456042

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2302/head:pr_2302 && git checkout pr_2302
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.18.0-af11b87

@roberthbailey roberthbailey merged commit 4015801 into googleforgames:main Oct 11, 2021
@roberthbailey roberthbailey added this to the 1.18.0 milestone Oct 12, 2021
@roberthbailey roberthbailey added the kind/cleanup Refactoring code, fixing up documentation, etc label Oct 12, 2021
@markmandel markmandel deleted the flaky/TestGameServerUnhealthyAfterReadyCrash branch October 12, 2021 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/tests Unit tests, e2e tests, anything to make sure things don't break cla: yes kind/cleanup Refactoring code, fixing up documentation, etc size/XS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants