Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agones controller does not remove deleted pod from game server list #678

Closed
KamiMay opened this issue Mar 28, 2019 · 11 comments · Fixed by #694
Closed

Agones controller does not remove deleted pod from game server list #678

KamiMay opened this issue Mar 28, 2019 · 11 comments · Fixed by #694
Labels
good first issue These are great first issues. If you are looking for a place to start, start here! help wanted We would love help on these issues. Please come help us! kind/bug These are bugs.
Milestone

Comments

@KamiMay
Copy link

KamiMay commented Mar 28, 2019

I had a fleet of 2 game servers running, one of the game server was allocated, I deleted allocated server by running kubectl delete pod pod-name-xyz this deleted the pod but when retrieving fleet information it still showed that fleet had two game servers running and kept returning allocated server IP and PORT to which we couldn't connect because game server pod has been deleted. Is there any reason why agones does not detect deleted pods?

@markmandel markmandel added kind/bug These are bugs. help wanted We would love help on these issues. Please come help us! good first issue These are great first issues. If you are looking for a place to start, start here! labels Mar 28, 2019
@markmandel
Copy link
Member

Deleting the Pod should delete the backing GameServer:

Logic is here:
https://github.com/GoogleCloudPlatform/agones/blob/master/pkg/gameservers/controller.go#L167

Sounds like we should add a e2e test here:
https://github.com/GoogleCloudPlatform/agones/blob/master/test/e2e/gameserver_test.go

Make sure it always works.

@markmandel
Copy link
Member

Oh - I assume this was on 0.9.0-rc?

@KamiMay
Copy link
Author

KamiMay commented Mar 28, 2019

No, it was on 0.8.1.

0.9.0-rc was crashing our matchamker so we had to revert it for QA, we are not sure why it was crashing our matchmaker, so I have not created an issue for that yet. Once I have time for investigation I will look into it, but approaching release, it might not be soon.

@markmandel
Copy link
Member

Digging into this some more - deleting a standalone gameserver's pod will make the GameServer move to Unhealthy (it's essentially the same as a crash).

If the GameServer belongs to a Fleet, then the GameServer should be recreated.

Got some more e2e tests working on this.

markmandel added a commit to markmandel/agones that referenced this issue Apr 1, 2019
Context: googleforgames#678

Wrote tests to confirm that deletion of a GameServer's backing Pod would
move a GameServer to `Unhealthy`.
markmandel added a commit to markmandel/agones that referenced this issue Apr 1, 2019
Context: googleforgames#678

Wrote tests to confirm that deletion of a GameServer's backing Pod would
move a GameServer to `Unhealthy`.
markmandel added a commit to markmandel/agones that referenced this issue Apr 1, 2019
Context: googleforgames#678

Wrote tests to confirm that deletion of a GameServer's backing Pod would
move a GameServer to `Unhealthy`.
@markmandel
Copy link
Member

So I wrote e2e tests in #684, and they all passed - does that match up to the context you were describing?

markmandel added a commit to markmandel/agones that referenced this issue Apr 1, 2019
Context: googleforgames#678

Wrote tests to confirm that deletion of a GameServer's backing Pod would
move a GameServer to `Unhealthy`.
@KamiMay
Copy link
Author

KamiMay commented Apr 2, 2019

I forgot to mention that the deleted pod had an allocated game server instance running on it.

@markmandel
Copy link
Member

Cool - I'll make sure I test that as well!

markmandel added a commit that referenced this issue Apr 3, 2019
Context: #678

Wrote tests to confirm that deletion of a GameServer's backing Pod would
move a GameServer to `Unhealthy`.
@markmandel
Copy link
Member

So you'll be pleased to know, that now that we have an e2e test, we can see this happens... sometimes.

Trying to work out why it only happens sometimes though. Somehow managed to hit it just now manually).

😬 inconsistent bugs are the best.

@markmandel
Copy link
Member

I think I see where this is... and it was... never really implemented? (or if it was, it was lost). Fixing it!

markmandel added a commit to markmandel/agones that referenced this issue Apr 7, 2019
There was only implementation of GameServer's being moved to Unhealthy
on very specific Pod events (container crash, being unschedulable), but
never on Pod removal as a whole.

Also it was locked down to specific states, which made it very fragile.

Reworked this so that any of the above now triggers a GameServer being
moved to an Unhealthy state.

Closes googleforgames#678
markmandel added a commit to markmandel/agones that referenced this issue Apr 12, 2019
There was only implementation of GameServer's being moved to Unhealthy
on very specific Pod events (container crash, being unschedulable), but
never on Pod removal as a whole.

Also it was locked down to specific states, which made it very fragile.

Reworked this so that any of the above now triggers a GameServer being
moved to an Unhealthy state.

Closes googleforgames#678
markmandel added a commit that referenced this issue Apr 12, 2019
There was only implementation of GameServer's being moved to Unhealthy
on very specific Pod events (container crash, being unschedulable), but
never on Pod removal as a whole.

Also it was locked down to specific states, which made it very fragile.

Reworked this so that any of the above now triggers a GameServer being
moved to an Unhealthy state.

Closes #678
@markmandel markmandel added this to the 0.10.0 milestone May 7, 2019
@djsell
Copy link

djsell commented Oct 4, 2019

I've had this happen to me a couple times in v1.0.0

@markmandel
Copy link
Member

@djsell - thanks for reporting!

Please open a new issue with repro steps, so we can look into it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue These are great first issues. If you are looking for a place to start, start here! help wanted We would love help on these issues. Please come help us! kind/bug These are bugs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants