-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agones controller down when enabling CustomFasSyncInterval on an existing cluster #2675
Comments
Thanks for the bug report! I'm trying to find it, but I think we covered this is another issue (or maybe it was Slack) - but short answer, I think this will be a That being said, we should likely update our docs here: In that it should be explicit that you need to delete all the Agones CRD values before upgrading (which includes FleetAutoscalers) within a cluster, and we should include that that includes Feature Flag updates. We do run a pre-hook on delete to delete everything: |
Thank you for your response! |
@roberthbailey what do you think of the above idea? |
Noting that we get reports that when doing in-place updates with feature gates can cause issues when existing resources are left in place, and therefore making a note in the docs to: a) Feature Gate changes are upgrades b) Delete all resource (including FleetAutoscalers) c) Even on Helm upgrades, delete the Agones resources. Work on googleforgames#2675
Noting that we get reports that when doing in-place updates with feature gates can cause issues when existing resources are left in place, and therefore making a note in the docs to: a) Feature Gate changes are upgrades b) Delete all resource (including FleetAutoscalers) c) Even on Helm upgrades, delete the Agones resources. Work on #2675
While I agree that we don't currently support live cluster upgrade, I don't want to take the stance that we won't ever support it. It's difficult and requires a lot more testing than we currently do, but it would be really nice to get to the point where we can support it in the future, especially as the feature velocity of the project decreases as the system matures and we want to make upgrade easier so that people pick up bug fixes and security patches. I'm concerned that adding a hook to upgrade to delete things could end up being disruptive (e.g. deleting allocated game servers) and unexpected. As one example, we tell people to helm upgrade to set the server TLS cert on the allocator service and right now that doesn't affect any workloads that might be running (all it does it update one component). |
That's an excellent point. In which case I rescind my suggestion to add the deletion operation to the upgrade hook. |
In which case - since #2678 is merged, shall we close this? |
sounds good to me. |
What happened:
When we enabled the CustomFasSyncInterval feature by updating Agones components in our cluster where FleetAutoscaler had been deployed, Agones controller did not start up and got to CrashLoopBackOff state.
What you expected to happen:
Agones controller starts up successfully after we enable the CustomFasSyncInterval feature by updating Agones components in our cluster even if FleetAutoscaler has already been deployed.
How to reproduce it (as minimally and precisely as possible):
We can reproduce this issue using our local minikube cluster. Follow the instructions below:
Anything else we need to know?:
We can avoid this issue by filling the default value to
sync
parameter on existing FleetAutoscaler before upgrading Agones.Environment:
kubectl version
): v1.22.10The text was updated successfully, but these errors were encountered: