Skip to content

Commit

Permalink
Graduate SafeToEvict to Beta (googleforgames#2950)
Browse files Browse the repository at this point in the history
Graduate `SafeToEvict` to Beta
  • Loading branch information
zmerlynn authored Feb 10, 2023
1 parent 921b1c1 commit 69e1a3d
Show file tree
Hide file tree
Showing 9 changed files with 163 additions and 30 deletions.
2 changes: 1 addition & 1 deletion build/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ KIND_CONTAINER_NAME=$(KIND_PROFILE)-control-plane
GS_TEST_IMAGE ?= us-docker.pkg.dev/agones-images/examples/simple-game-server:0.14

# Enable all alpha feature gates. Keep in sync with `false` (alpha) entries in pkg/util/runtime/features.go:featureDefaults
ALPHA_FEATURE_GATES ?= "PlayerAllocationFilter=true&PlayerTracking=true&ResetMetricsOnDelete=true&SafeToEvict=true&PodHostname=true&SplitControllerAndExtensions=true&CountsAndLists=true&Example=true"
ALPHA_FEATURE_GATES ?= "PlayerAllocationFilter=true&PlayerTracking=true&ResetMetricsOnDelete=true&PodHostname=true&SplitControllerAndExtensions=true&CountsAndLists=true&Example=true"

# Build with Windows support
WITH_WINDOWS=1
Expand Down
6 changes: 3 additions & 3 deletions cloudbuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -295,13 +295,13 @@ steps:
do
if [ $cloudProduct = generic ]
then
featureWithGate="CustomFasSyncInterval=false&SDKGracefulTermination=false&StateAllocationFilter=false&PlayerAllocationFilter=true&PlayerTracking=true&ResetMetricsOnDelete=true&SafeToEvict=true&PodHostname=true&SplitControllerAndExtensions=true&Example=true"
featureWithGate="CustomFasSyncInterval=false&SafeToEvict=false&SDKGracefulTermination=false&StateAllocationFilter=false&PlayerAllocationFilter=true&PlayerTracking=true&ResetMetricsOnDelete=true&PodHostname=true&SplitControllerAndExtensions=true&Example=true"
featureWithoutGate=""
testClusterLocation="us-west1-c"
testCluster="e2e-test-cluster"
else
featureWithGate="CustomFasSyncInterval=false&SDKGracefulTermination=false&StateAllocationFilter=false&PlayerAllocationFilter=true&PlayerTracking=true&ResetMetricsOnDelete=true&SafeToEvict=true&PodHostname=true&SplitControllerAndExtensions=true&Example=true"
featureWithoutGate="SafeToEvict=true&SplitControllerAndExtensions=true"
featureWithGate="CustomFasSyncInterval=false&SafeToEvict=true&SDKGracefulTermination=false&StateAllocationFilter=false&PlayerAllocationFilter=true&PlayerTracking=true&ResetMetricsOnDelete=true&PodHostname=true&SplitControllerAndExtensions=true&Example=true"
featureWithoutGate="SplitControllerAndExtensions=true"
testClusterLocation="us-west1"
testCluster="gke-autopilot-e2e-test-cluster-1-24"
fi
Expand Down
2 changes: 1 addition & 1 deletion install/helm/agones/defaultfeaturegates.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@

# Beta features
CustomFasSyncInterval: true
SafeToEvict: true
SDKGracefulTermination: true
StateAllocationFilter: true

# Alpha features
PlayerAllocationFilter: false
PlayerTracking: false
ResetMetricsOnDelete: false
SafeToEvict: false
PodHostname: false
SplitControllerAndExtensions: false

Expand Down
100 changes: 99 additions & 1 deletion install/yaml/install.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,16 @@
---
# Source: agones/templates/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: agones-gameserver-safe-to-evict-false
namespace: default
spec:
maxUnavailable: 0%
selector:
matchLabels:
agones.dev/safe-to-evict: "false"
---
# Source: agones/templates/service/allocation.yaml
# Create a ServiceAccount that will be bound to the above role
apiVersion: v1
Expand Down Expand Up @@ -5087,6 +5099,27 @@ spec:
type: integer
title: The initial player capacity of this Game Server
minimum: 0
eviction:
type: object
title: Eviction tolerance of the game server
properties:
safe:
type: string
title: Game server supports termination via SIGTERM
description: |
- Never: The game server should run to completion. Agones sets Pod annotation `cluster-autoscaler.kubernetes.io/safe-to-evict: "false"` and label `agones.dev/safe-to-evict: "false"`, which matches a restrictive PodDisruptionBudget.
- OnUpgrade: On SIGTERM, the game server will exit within `terminationGracePeriodSeconds` or be terminated; Agones sets Pod annotation `cluster-autoscaler.kubernetes.io/safe-to-evict: "false"`, which blocks evictions by Cluster Autoscaler. Evictions from node upgrades proceed normally.
- Always: On SIGTERM, the game server will exit within `terminationGracePeriodSeconds` or be terminated, typically within 10m; Agones sets Pod annotation `cluster-autoscaler.kubernetes.io/safe-to-evict: "true"`, which allows evictions by Cluster Autoscaler.
enum:
- Always
- OnUpgrade
- Never
immutableReplicas:
type: integer
title: Immutable count of Pods to a GameServer. Always 1. (Implementation detail of implementing the Scale subresource.)
default: 1
minimum: 1
maximum: 1
status:
description: 'FleetStatus is the status of a Fleet. More info:
https://agones.dev/site/docs/reference/agones_crd_api_reference/#agones.dev/v1.Fleet'
Expand Down Expand Up @@ -10049,7 +10082,28 @@ spec:
initialCapacity:
type: integer
title: The initial player capacity of this Game Server
minimum: 0
minimum: 0
eviction:
type: object
title: Eviction tolerance of the game server
properties:
safe:
type: string
title: Game server supports termination via SIGTERM
description: |
- Never: The game server should run to completion. Agones sets Pod annotation `cluster-autoscaler.kubernetes.io/safe-to-evict: "false"` and label `agones.dev/safe-to-evict: "false"`, which matches a restrictive PodDisruptionBudget.
- OnUpgrade: On SIGTERM, the game server will exit within `terminationGracePeriodSeconds` or be terminated; Agones sets Pod annotation `cluster-autoscaler.kubernetes.io/safe-to-evict: "false"`, which blocks evictions by Cluster Autoscaler. Evictions from node upgrades proceed normally.
- Always: On SIGTERM, the game server will exit within `terminationGracePeriodSeconds` or be terminated, typically within 10m; Agones sets Pod annotation `cluster-autoscaler.kubernetes.io/safe-to-evict: "true"`, which allows evictions by Cluster Autoscaler.
enum:
- Always
- OnUpgrade
- Never
immutableReplicas:
type: integer
title: Immutable count of Pods to a GameServer. Always 1. (Implementation detail of implementing the Scale subresource.)
default: 1
minimum: 1
maximum: 1
status:
description: 'GameServerStatus is the status for a GameServer resource. More info:
https://agones.dev/site/docs/reference/agones_crd_api_reference/#agones.dev/v1.GameServer'
Expand Down Expand Up @@ -10101,6 +10155,29 @@ spec:
nullable: true
items:
type: string
eviction:
type: object
properties:
safe:
type: string
enum:
- Always
- OnUpgrade
- Never
immutableReplicas:
type: integer
title: Immutable count of Pods to a GameServer. Always 1. (Implementation detail of implementing the Scale subresource.)
default: 1
minimum: 1
maximum: 1
subresources:
# scale enables the scale subresource. We can't actually scale GameServers, but this allows
# for the use of PodDisruptionBudget (PDB) without having to use a PDB per Pod.
scale:
# specReplicasPath defines the JSONPath inside of a custom resource that corresponds to Scale.Spec.Replicas.
specReplicasPath: .spec.immutableReplicas
# statusReplicasPath defines the JSONPath inside of a custom resource that corresponds to Scale.Status.Replicas.
statusReplicasPath: .status.immutableReplicas
---
# Source: agones/templates/crds/gameserverallocationpolicy.yaml
# Copyright 2019 Google LLC All Rights Reserved.
Expand Down Expand Up @@ -15120,6 +15197,27 @@ spec:
type: integer
title: The initial player capacity of this Game Server
minimum: 0
eviction:
type: object
title: Eviction tolerance of the game server
properties:
safe:
type: string
title: Game server supports termination via SIGTERM
description: |
- Never: The game server should run to completion. Agones sets Pod annotation `cluster-autoscaler.kubernetes.io/safe-to-evict: "false"` and label `agones.dev/safe-to-evict: "false"`, which matches a restrictive PodDisruptionBudget.
- OnUpgrade: On SIGTERM, the game server will exit within `terminationGracePeriodSeconds` or be terminated; Agones sets Pod annotation `cluster-autoscaler.kubernetes.io/safe-to-evict: "false"`, which blocks evictions by Cluster Autoscaler. Evictions from node upgrades proceed normally.
- Always: On SIGTERM, the game server will exit within `terminationGracePeriodSeconds` or be terminated, typically within 10m; Agones sets Pod annotation `cluster-autoscaler.kubernetes.io/safe-to-evict: "true"`, which allows evictions by Cluster Autoscaler.
enum:
- Always
- OnUpgrade
- Never
immutableReplicas:
type: integer
title: Immutable count of Pods to a GameServer. Always 1. (Implementation detail of implementing the Scale subresource.)
default: 1
minimum: 1
maximum: 1
status:
description: 'GameServerSetStatus is the status of a GameServerSet. More info:
https://agones.dev/site/docs/reference/agones_crd_api_reference/#agones.dev/v1.GameServerSet'
Expand Down
30 changes: 14 additions & 16 deletions pkg/apis/agones/v1/gameserver_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,8 @@ func TestGameServerApplyDefaults(t *testing.T) {
GRPCPort: 9357,
HTTPPort: 9358,
},
evictionSafeSpec: EvictionSafeNever,
evictionSafeStatus: EvictionSafeNever,
}
f(&e)
return e
Expand Down Expand Up @@ -301,21 +303,22 @@ func TestGameServerApplyDefaults(t *testing.T) {
}
}),
},
"SafeToEvict gate off => no SafeToEvict fields": {
"SafeToEvict gate off => no eviction.safe fields": {
featureFlags: string(runtime.FeatureSafeToEvict) + "=false",
gameServer: defaultGameServerAnd(func(gss *GameServerSpec) {}),
expected: wantDefaultAnd(func(e *expected) {}),
expected: wantDefaultAnd(func(e *expected) {
e.evictionSafeSpec = ""
e.evictionSafeStatus = ""
}),
},
"SafeToEvict gate on => SafeToEvict: Never": {
featureFlags: string(runtime.FeatureSafeToEvict) + "=true",
gameServer: defaultGameServerAnd(func(gss *GameServerSpec) {}),
"defaults are eviction.safe: Never": {
gameServer: defaultGameServerAnd(func(gss *GameServerSpec) {}),
expected: wantDefaultAnd(func(e *expected) {
e.evictionSafeSpec = EvictionSafeNever
e.evictionSafeStatus = EvictionSafeNever
}),
},
"SafeToEvict: Always": {
featureFlags: string(runtime.FeatureSafeToEvict) + "=true",
"eviction.safe: Always": {
gameServer: defaultGameServerAnd(func(gss *GameServerSpec) {
gss.Eviction.Safe = EvictionSafeAlways
}),
Expand All @@ -324,8 +327,7 @@ func TestGameServerApplyDefaults(t *testing.T) {
e.evictionSafeStatus = EvictionSafeAlways
}),
},
"SafeToEvict: OnUpgrade": {
featureFlags: string(runtime.FeatureSafeToEvict) + "=true",
"eviction.safe: OnUpgrade": {
gameServer: defaultGameServerAnd(func(gss *GameServerSpec) {
gss.Eviction.Safe = EvictionSafeOnUpgrade
}),
Expand All @@ -334,8 +336,7 @@ func TestGameServerApplyDefaults(t *testing.T) {
e.evictionSafeStatus = EvictionSafeOnUpgrade
}),
},
"SafeToEvict: Never": {
featureFlags: string(runtime.FeatureSafeToEvict) + "=true",
"eviction.safe: Never": {
gameServer: defaultGameServerAnd(func(gss *GameServerSpec) {
gss.Eviction.Safe = EvictionSafeNever
}),
Expand All @@ -344,8 +345,7 @@ func TestGameServerApplyDefaults(t *testing.T) {
e.evictionSafeStatus = EvictionSafeNever
}),
},
"SafeToEvict: Always inferred from safe-to-evict=true": {
featureFlags: string(runtime.FeatureSafeToEvict) + "=true",
"eviction.safe: Always inferred from safe-to-evict=true": {
gameServer: defaultGameServerAnd(func(gss *GameServerSpec) {
gss.Template.ObjectMeta.Annotations = map[string]string{PodSafeToEvictAnnotation: "true"}
}),
Expand All @@ -355,7 +355,6 @@ func TestGameServerApplyDefaults(t *testing.T) {
}),
},
"Nothing inferred from safe-to-evict=false": {
featureFlags: string(runtime.FeatureSafeToEvict) + "=true",
gameServer: defaultGameServerAnd(func(gss *GameServerSpec) {
gss.Template.ObjectMeta.Annotations = map[string]string{PodSafeToEvictAnnotation: "false"}
}),
Expand All @@ -364,8 +363,7 @@ func TestGameServerApplyDefaults(t *testing.T) {
e.evictionSafeStatus = EvictionSafeNever
}),
},
"safe-to-evict=false AND SafeToEvict: Always => SafeToEvict: Always": {
featureFlags: string(runtime.FeatureSafeToEvict) + "=true",
"safe-to-evict=false AND eviction.safe: Always => eviction.safe: Always": {
gameServer: defaultGameServerAnd(func(gss *GameServerSpec) {
gss.Eviction.Safe = EvictionSafeAlways
gss.Template.ObjectMeta.Annotations = map[string]string{PodSafeToEvictAnnotation: "false"}
Expand Down
8 changes: 4 additions & 4 deletions pkg/util/runtime/features.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ const (
// FeatureCustomFasSyncInterval is a feature flag that enables a custom FleetAutoscaler resync interval
FeatureCustomFasSyncInterval Feature = "CustomFasSyncInterval"

// FeatureSafeToEvict enables the `SafeToEvict` API to specify disruption tolerance.
FeatureSafeToEvict Feature = "SafeToEvict"

// FeatureSDKGracefulTermination is a feature flag that enables SDK to support gracefulTermination
FeatureSDKGracefulTermination Feature = "SDKGracefulTermination"

Expand All @@ -54,9 +57,6 @@ const (
// relevant metric views to reset their state immediately when an Agones resource is deleted.
FeatureResetMetricsOnDelete Feature = "ResetMetricsOnDelete"

// FeatureSafeToEvict enables the `SafeToEvict` API to specify disruption tolerance.
FeatureSafeToEvict Feature = "SafeToEvict"

// FeaturePodHostname enables the Pod Hostname being assigned the name of the GameServer
FeaturePodHostname = "PodHostname"

Expand Down Expand Up @@ -106,14 +106,14 @@ var (
featureDefaults = map[Feature]bool{
// Beta features
FeatureCustomFasSyncInterval: true,
FeatureSafeToEvict: true,
FeatureSDKGracefulTermination: true,
FeatureStateAllocationFilter: true,

// Alpha features
FeaturePlayerAllocationFilter: false,
FeaturePlayerTracking: false,
FeatureResetMetricsOnDelete: false,
FeatureSafeToEvict: false,
FeaturePodHostname: false,
FeatureSplitControllerAndExtensions: false,

Expand Down
5 changes: 5 additions & 0 deletions site/content/en/docs/Advanced/controlling-disruption.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,12 @@ description: >

By default, Agones assumes your game server should never be disrupted voluntarily and configures the `Pod` appropriately - but this isn't always the ideal setting. Here we discuss how Agones allows you to control the two most significant sources of voluntary `Pod` evictions, node upgrades and Cluster Autoscaler, using the `eviction` API on the `GameServer` object.

{{% feature publishVersion="1.30.0" %}}
{{< beta title="`eviction` API" gate="SafeToEvict" >}}
{{% /feature %}}
{{< feature expiryVersion="1.30.0" >}}
{{< alpha title="`eviction` API" gate="SafeToEvict" >}}
{{% /feature %}}

## Benefits of Allowing Voluntary Disruption

Expand Down
25 changes: 21 additions & 4 deletions site/content/en/docs/Advanced/scheduling-and-autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,17 +97,33 @@ This affects the Cluster autoscaler, Allocation Scheduling, Pod Scheduling and F
#### Cluster Autoscaler
{{% feature publishVersion="1.30.0" %}}
When using the "Packed" strategy, Agones will ensure that the Cluster Autoscaler doesn't attempt to evict and move `GameServer` `Pods` onto new Nodes during
gameplay.

{{< beta title="`eviction` API" gate="SafeToEvict" >}}

If a gameserver can tolerate [being evicted](https://kubernetes.io/docs/concepts/scheduling-eviction/api-eviction/#how-api-initiated-eviction-works)
(generally in combination with setting an appropriate graceful termination period on the gameserver pod) and you
want the Cluster Autoscaler to compact your cluster by evicting game servers when it would allow the Cluster
Autoscaler to reduce the number of nodes in the cluster, [Controlling Disruption]({{< relref "controlling-disruption.md" >}}) describes
how to choose the `.eviction` setting appropriate for your `GameServer` or `Fleet`.
{{% /feature %}}

{{% feature expiryVersion="1.30.0" %}}
When using the “Packed” strategy, Agones will ensure that the Cluster Autoscaler doesn't attempt to evict and move `GameServer` `Pods` onto new Nodes during
gameplay by adding the annotation [`"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"`](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node)
to the backing Pod.
{{< alert title="SafeToEvict Feature Gate" color="info" >}}
{{% /feature %}}
{{< feature expiryVersion="1.30.0" >}}
{{% alert title="SafeToEvict Feature Gate" color="info" %}}
The [Alpha]({{< ref "/docs/Guides/feature-stages.md#alpha" >}}) `SafeToEvict` feature allows
[controlling disruption]({{< relref "controlling-disruption.md" >}}) in a more holistic way.
Please consider enabling `SafeToEvict` and using the new `eviction` API - we welcome your
early feedback!
{{< /alert >}}

{{% /alert %}}
{{< /feature >}}
{{% feature expiryVersion="1.30.0" %}}
However, if a gameserver can tolerate [being evicted](https://kubernetes.io/docs/concepts/scheduling-eviction/api-eviction/#how-api-initiated-eviction-works)
(generally in combination with setting an appropriate graceful termination period on the gameserver pod) and you
want the Cluster Autoscaler to compact your cluster by evicting game servers when it would allow the Cluster
Expand Down Expand Up @@ -156,6 +172,7 @@ spec:
# grace period for terminating the game server safely.
terminationGracePeriodSeconds: 300
```
{{% /feature %}}

#### Allocation Scheduling Strategy

Expand Down
Loading

0 comments on commit 69e1a3d

Please sign in to comment.