-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit flag for "can be disrupted": eviction.safe #2794
Comments
In general, a There is a big difference in how the system can treat the pod depending on what that cleanup time is, e.g. a game server that can clean up in 10 seconds could run on a spot VM whereas one that needs more than 10 minutes cannot be marked as Are you proposing that the |
Maybe bad idea: use Basically, if the game server had a way to express "I would like to stay running for X time", I feel like we could have policies flow from that rather than having people manually discover the right policies for their particular workload / product. The name is more of a bikeshed discussion. WDYT? |
One wrinkle we need to consider about the "I would like to stay running for X time" case is that a game server configuration probably wants to express that time as time once the server has been allocated (e.g. ready servers should be able to be reaped by the system since they have no active players on them). So it would be something like, "once I'm allocated and have active players, I need to keep running for at least 30 minutes to let them finish their session before I can be disrupted." Right now this can be achieved by setting the graceful termination period to 30 minutes, but that only handles some types of voluntary disruptions and not others (e.g. the autoscaler evictions that override the graceful termination value and set it to 10 minutes). Getting this right likely requires some cooperation from the game server itself though, e.g. to exit quickly if it is not allocated even if it has a long graceful termination period set or to exit voluntarily after a session finishes before the time it wants to stay running expires. |
You're right that I was very optimistic about how easy this was to plug into the existing lifecycle. It seems like we would probably need to build a field that has some indication of whether the game server itself knows anything about graceful termination, and maybe build off there. One thing I note is that the SDK seems to have no control over the Let me spend a little more time with this and try to figure out something reasonable. |
@markmandel @roberthbailey Full proposal in top comment, PTAL! |
Also on-prem. All this stuff is tuneable for custom installations -- so we'd also need to account for that. So if a user has tuned their gke autoscaler to be longer than 10m, how do we let this system know? Helm config? 😄
🤔 I feel a bit.... icky? about making Shutdown() do different things, depending on policy here. Feels like a lot of magic. I'd rather we were explicit. This also takes away the ability for the GameServer to explicitly shut itself down, if it actually determines it should actually shut down properly, rather than restart. Which ties into GameServers in allocated state can respond to fleet updating #2682, in which on an update a GameServer would likely choose to shutdown before it's full session length is required. Maybe we add have a If we do that, I don't think we need a Not sure we can find a solution for this one (or should for this design), but something to think about, having had conversations around "shut this down only after n sessions have occurred or there are no players connected after y seconds. But that can be done down the line, and probably tied to #2716 (and also there's no graceful termination for 100 players going to zero). |
Hmm. I think it depends on how we see the contract. IMO, assuming we weren't in the middle of a fleet scale-down (which I agree I didn't discuss, but I think it can be covered), there is no contract difference today between giving a gameserver the same pod to reuse (by magically transitioning to Regardless of talks of contracts, though, this feature is still opt-in. It requires the
This setting is designed to allow for a bit of time to call
This is an advanced setting and I only expect to need to change it in specific circumstances - one aspect of the API design I'm still kind of toying with is how to move this setting to a different message to make it clearer you probably don't need to care. |
Right, be in your scenario we are taking away the option of immediate, actual shutdown - which in many circumstances may be the thing that the end user needs to do (Fleet update is just one example). I'd rather give the end user the explicit option, than make the documentation a series of if/else statements for a singular function depending on which promise you make when and where. It's far easy for a end user to reason about, and also easier for us to document, maintain and test. (Maybe
How do we force people to call |
I was thinking about this some more on my dog walk this morning 😁 i think I know why this feels so complicated to me, and I think it's because we've conflated two things, that I don't think need to be joined together:
And I think for the sake of this design, we actually only need to care about No. 1, and more discussion of No. 2 should probably open back up in #2781. This would be my suggestion - to keep things simple, as an end user what I care about is (ignoring implementation details):
None of this actually cares about if the GameServer is being reused or not -- that's entirely up to the end user to decide and manage. They know when they have been disrupted (most likely SIGTERM signal), and they have configured how long they have left. The onus on them to manage it correctly themselves. I think the idea of |
… pdb (#2807) This PR, under the `LifecycleContract` feature gate suggested in #2794 * Adds a scale resource to `GameServer` by adding an `immutableReplicas` field, which has a `default`, `min` and `max` of 1. Having a `scale` subresource lets us define a `PodDisruptionBudget` that can be set to `maxUnavailable: 0%` [1]. * Adds a PDB per namespace with label selector `agones.dev/safe-to-evict: "false"`, which nothing yet adds. * Adds a mechanism to get feature gate values in Helm by using: {{- $featureGates := include "agones.featureGates" . | fromYaml }} * Cleanup / documentation of feature gate mechanisms After this PR, it's possible to define a fleet with the label and have all `GameServer` pods protected by a `PodDisruptionBudget`, e.g.: ``` $ kubectl scale fleet/fleet-example --replicas=5 fleet.agones.dev/fleet-example scaled $ kubectl describe pdb Name: agones-gameserver-safe-to-evict-false Namespace: default Max unavailable: 0% Selector: agones.dev/safe-to-evict=false Status: Allowed disruptions: 0 Current: 4 Desired: 5 Total: 5 Events: <none> ``` Additionally, because min/max/default are 1, Kubernetes enforces the immutability for us: ``` $ kubectl scale gs/fleet-example-k6dfs-6m5nq --replicas=1 gameserver.agones.dev/fleet-example-k6dfs-6m5nq scaled $ kubectl scale gs/fleet-example-k6dfs-6m5nq --replicas=2 The GameServer "fleet-example-k6dfs-6m5nq" is invalid: spec.immutableReplicas: Invalid value: 2: spec.immutableReplicas in body should be less than or equal to 1 $ kubectl scale gs/fleet-example-k6dfs-6m5nq --replicas=0 The GameServer "fleet-example-k6dfs-6m5nq" is invalid: spec.immutableReplicas: Invalid value: 0: spec.immutableReplicas in body should be greater than or equal to 1 ``` The only artifact of this addition is a new field in the Spec/Status named `immutableReplicas`, in the Kubernetes object. This field is not present in the in-memory representation for `GameServer`, nor is it present in `etcd` (by defaulting rules). The field is visible on `describe` or `get -oyaml`, but is otherwise ignored. [1] https://kubernetes.io/docs/tasks/run-application/configure-pdb/#identify-an-application-to-protect
@markmandel Updated the proposal and stripped it back to the original intent: a single flag to control whether the game server can be gracefully terminated or needs run to completion. I'll re-open #2781 and iterate there on discussion of pod reuse. For posterity, the old proposal is in the first comment, last modified 2022/11/18. |
Updated the proposal again after I realized my understanding of pod preemption was wrong. See #2793 (comment). |
Updated the feature gate section to discuss the behavior change and document how we'll turn off the PDB as necessary. |
Given only up until recently has someone asked us to allow for the cluster autoscaler to evict running GameServers - I'm wondering if we should optimise for disabling that feature, and only allow it if people opt in? Right now, it doesn't seem possible to have gracefultermination, but not let the cluster autoscaler scale down instances - is that by design? |
That's what the default of
If the user specifies the pod annotation Outside of voluntary disruption by the autoscaler, cluster upgrade and preemption are the primary other sources. So I think you're saying, is there a way to block the autoscaler but allow graceful termination in other circumstances? If
looks really weird, but it says "game server supports graceful termination, but I just don't want Cluster Autoscaler to evict it". It might be a reasonable configuration for e.g. allowing upgrades but not scaledown. |
One gotcha I'll call out working through the code: Currently, we only apply |
Updated the proposal after our meeting today. It is now a 3-way enum instead of true/false and more easily captures the concept of "upgrades only". PTAL. |
Per googleforgames#2794, rename `LifecycleContract` to `SafeToEvict`.
This LGTM! Only question I had as a follow up: If I set |
I think this is a good case for a webhook warning. We currently don't use the webhook warning response, but we could. That said, there are limited circumstances when the default |
What is a "warning" from a webhook? 😄 I don't know this one! We could also add it at a later date if we find people are footgunning themselves. SGTM! (I still wish I could work out a better than than |
Oooh, I'll give you one extra idea for potential configuration: apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
name: "xonotic"
spec:
eviction:
safe: Always
ports:
- name: default
containerPort: 26000
template:
spec:
containers:
- name: xonotic
image: us-docker.pkg.dev/agones-images/examples:0.9 Not sold on "safety" as the sub Edit: changed |
Check out https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#response (and ^F for warning).. it's relatively new. |
I like
I can roll with that. |
Updated the proposal to use the |
What other sub-fields do you image being under eviction in the future? |
I'm not clear exactly what we need to future-proof either, but there is a general philosophy to use an object/message even when you think all you need is a single field (it's sort of the multi-variate form of "use an enum instead of a bool"). One thing for sure is that it gives us a place to offer an alternate representation, like if we (much later) decide the enum was a bad idea, it may be simple enough to also allow the list, i.e. now:
then:
and offer magic translation between them. |
I feel like if we wanted to add a generic field terminationGracePeriodSeconds time or session length, this would be a good spot for it to go. Top level singular fields always worry me, because there is no space to expand if you want to add extra options down the line. |
Thanks for explaining.
LGTM |
…mes#2849) * SDKGracefulTermination: Promote to beta Follows checklist in features.go. Closes googleforgames#2831 * Rename `LifecycleContract` feature gate to `SafeToEvict` Per googleforgames#2794, rename `LifecycleContract` to `SafeToEvict`.
…ok (#2857) * SafeToEvict feature: Implement the `eviction.safe` API logic Towards #2794: This commit implements the defaulting/validation for the eviction.safe field, under the SafeToEvict feature gate. In particular, it enshrines the defaulting logic for the somewhat complicated backwards compatibility case (safe-to-evict=true is set but SafeToEvict is not). Along the way, refactor a very repetitive unit test table. * Ran gen-api-docs * Implement SafeToEvict enforcement of safe-to-evict annotations/label This commit adds the interpretation of GameServer.Status.Eviction: * Following the pattern in #2840, moves the enforcement of cloudproduct specific things up to the controller, adding a new cloudproduct SetEviction hook to handle it. * As a result, when the SafeToEvict feature flag is on, we disable annotation in gameserver.go and test that we don't set anything (as a change detection test). * Adds a generic and GKE Autopilot version of how to set policy from the eviction.safe status. In Autopilot, we skip using safe-to-evict=false.
This is implemented, see https://agones.dev/site/docs/advanced/controlling-disruption/ |
Overview
In Agones today, you need intimate knowledge of sources of voluntary pod disruption on your cluster to understand how to configure the
GameServer
specification best for your game server. For example:terminationGracePeriodSeconds
) is allowed by all sources of voluntary disruption, but by default, Cluster Autoscaler only permits 10m of graceful termination (and GKE does not allow this to be tuned). Additionally, pod graceful termination requires cooperation of the pod, as it needs to intercept SIGTERM (or it must define a preStop hook).safe-to-evict=false
pod annotation influences the behavior of Cluster Autoscaler, but no other sources of voluntary disruption - but this flag is blocked on GKE Autopilot, as mentioned in Support Agones on GKE Autopilot #2777.PodDisruptionBudget
as a way to express “do not disrupt” for game server pods. This is roughly the equivalent of thesafe-to-evict=false
pod annotation, but works for anything that uses the standard eviction libraries (including GKE node upgrades, for up to an hour per node). However,PodDisruptionBudget
may be violated by pod preemption.As long as you understand the complexity, you can today tune Agones to what you need. But some of the complexity changes between cloud products (GKE Autopilot is more opinionated than GKE Standard).
Here we propose a field
eviction
within theGameServer spec
that controls whether your game server understands graceful termination, and whether it should be disrupted for autoscaling events, upgrades, or allowed to run to completion without disruption.eviction.safe
We propose field
eviction
inGameServer
with a single fieldsafe
, a string-typed enum with optionsAlways
,Never
,OnUpgrade
, like:Each option uses a slightly different permutation of:
safe-to-evict
annotation to block Cluster Autoscaler based evictionagones.dev/safe-to-evict
label selector to select the PDB from GameServer: Implement (immutable) scale subresource, add pdb #2807, which blocks Cluster Autoscaler and (for a limited time) disruption from node upgrades.As a quick reference:
safe-to-evict
pod annotationagones.dev/safe-to-evict
labelNever
(default)false
false
(matches PDB)OnUpgrade
false
true
(does not match PDB)Always
true
true
(does not match PDB)safe: Never
(the default)Agones ensures the game server pod runs to completion by:
safe-to-evict=false
pod annotationsafe: OnUpgrade
Agones blocks Cluster Autoscaler based evictions, but allows eviction from upgrades. On certain providers (GKE, for example), the graceful termination support for upgrades is much longer than is support by Cluster Autoscaler (1h vs 10m), so supporting eviction for upgrades alone may work well.
The game server binary must be prepared to receive
SIGTERM
for evictions. If the default grace period of 30s is insufficient, you must tuneterminationGracePeriodSeconds
in the pod spec as well. For broad support across cloud providers, the game server must exit within 1h after receivingSIGTERM
.safe: Always
Allow evictions by both Cluster Autoscaler and upgrades. As with
OnUpgrade
, the game server binary must be prepared to receiveSIGTERM
. If the default grace period of 30s is insufficient, you must tuneterminationGracePeriodSeconds
in the pod spec as well.By default, Cluster Autoscaler supports only 10m of graceful termination. For broad support across cloud providers, the game server must exit within 10m after receiving
SIGTERM
if running on a cluster where Cluster Autoscaler down-scaling is enabled (or running on a product that implicitly does, like GKE Autopilot). As withOnUpgrade
, one hour of graceful termination is broadly supported if Autoscaler is not enabled (and in fact, if Autoscaler is not enabled,safe: Always
is the same assafe: OnUpgrade
).Backward Compatibility with
safe-to-evict=true
pod annotation; Existing overrides in templateWe propose that using a similar approach to #2754, for any case where Agones modifies the pod, we honor the provided
GameServer.template
in preference to anything Agones would set automatically - for example, if you set theagones.dev/safe-to-evict
label intemplate
, Agones assumes you know better. We will warn when such conflicts occur to make it obvious, though.That said, one case we need to be particularly sensitive of is the case in #2754 itself:
safe
defaults toNever
. Mixing thesafe: Never
GameServer
default with thesafe-to-evict=true
pod annotation doesn’t make sense: the inflexiblePodDisruptionBudget
prevents Cluster Autoscaler scale-down, even ifsafe-to-evict=true
would allow it for other cases.Prior to the proposal here, the
safe-to-evict
pod annotation is currently the only documented Agones disruption control. To maintain backwards compatibility, we match the original intent of thesafe-to-evict=true
pod annotation, which is to allow graceful terminatio: If thesafe-to-evict=true
pod annotation is present in the template, we enablesafe: Always
in theGameServer
.We do not make a similar inference for
terminationGracePeriodSeconds
(despite it also being in our documentation) as it is unnecessary.With this one piece of backward compatibility resolved, it’s safe to enable the
safe: Never
default without breaking existing workloads.Feature gate: new
SafeToEvict
replacesLifecycleContract
.We propose a new feature gate
SafeToEvict
(which will replaceLifecycleContract
). Graduation to Beta should be relatively rapid as the feature has broad backward compatibility. The only situation where backward compatibility is violated is if the user is negatively impacted by a PDB on the game server pods, e.g. the user is relying on the defaultsafe-to-evict=false
pod annotation, but also is assuming that node drain (e.g. via cluster upgrade) will evict. Given that our existing documentation doesn't allow for node drains, we think this is a safe case for backwards incompatibility. To be conservative about it, though, when we graduateSafeToEvict
toBeta
, we will flag it a potentially breaking change in Release Notes to call attention to it. To allow this situation, the user simply needs to seteviction: safe: OnUpgrade
.Why not default to
OnUpgrade
then?Despite the possibility of introducing a behavior change, we still think
Never
is the right default.OnUpgrade
requires the game server to understandSIGTERM
, and typically terminate within an hour.Dependency on
SDKGracefulTermination
feature gateAnything but
safe: Never
relies on theSDKGracefulTermination
feature gate. IfSDKGracefulTermination
is disabled but theGameServer
hassafe: Always
orsafe: OnUpgrade
(explicitly, or implicitly via backwards compatibility), we will warn but allow it.In #2831, we proposed graduating
SDKGracefulTermination
to Beta, so this situation should only occur ifSDKGracefulTermination
is explicitly disabled andSafeToEvict
is explicitly enabled, which seems like an odd choice.The text was updated successfully, but these errors were encountered: