Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agones game server pods should use a non-preempting PriorityClass #2793

Closed
zmerlynn opened this issue Nov 7, 2022 · 3 comments
Closed

Agones game server pods should use a non-preempting PriorityClass #2793

zmerlynn opened this issue Nov 7, 2022 · 3 comments
Assignees
Labels
kind/feature New features for Agones wontfix Sorry, but we're not going to do that.

Comments

@zmerlynn
Copy link
Collaborator

zmerlynn commented Nov 7, 2022

Is your feature request related to a problem? Please describe.

Game server pods do not specify a PriorityClass and can be preempted by higher priority pods. As most Agones clusters are ~mostly dedicated Agones workloads, and most applications do not yet specify a PriorityClass, this is more of a theoretical than practical concern. However, it's one area we can easily control by using a a Non-preempting PriorityClass.

Describe the solution you'd like

When configured, create a PriorityClass object that looks something like:

apiVersion: scheduling.k8s.io/v1
description: Priority class for Agones game server pods
kind: PriorityClass
metadata:
  name: agones-game-server-pods
preemptionPolicy: Never
value: 1000000

and use the PriorityClass when the controller creates pods. This could be done by injecting it in GameServer.Pod() (but would require a flag), or just injecting it into the gameserver.Spec.Template in the controller code, not clear which is better.

Describe alternatives you've considered

PodDisruptionBudget and graceful termination are both options. Graceful termination doesn't quite fit as well. PodDisruptionBudget would require us to create one instance per GS pod due to Agones falling into the "Arbitrary Controller" logic. (We should consider this anyways, though, despite how spammy it is, but I will consider it in a separate issue.)

@zmerlynn zmerlynn added the kind/feature New features for Agones label Nov 7, 2022
@zmerlynn zmerlynn self-assigned this Nov 7, 2022
@roberthbailey
Copy link
Member

This is similar to #1350

@zmerlynn
Copy link
Collaborator Author

zmerlynn commented Dec 1, 2022

I dove into this problem this afternoon. My understanding of pod preemption was ... off: preemptionPolicy: Never does not prevent the pod from being preempted, but rather the opposite: a pod with a non-prempting policy won't preempt another pod. Combined with a high priority, it's useful to express the concept of "this pod has a high priority once scheduled, but don't clobber existing workloads to achieve this".

That seems potentially useful, but then I started to work through what this looked like instead; see ad5946a for an almost fully worked PR (that I'm about to argue to abandon):

  • We could add a priority class for game server pods at "high" value. One problem is there is literally no convention for priority class values. There's some Very High values for system pods, but between 0 and 2e9, there's no guidance. We use 1000000 for the controller, so I chose 999999 for the game servers.
  • The reason I chose to keep the controller just barely higher was to avoid a priority inversion between effectively the control plane and data plane, which could lead to pathological behaviors. Imagine e.g. scaling a fleet up enough that the controller gets preempted by the game server pod - without an autoscaler you might lose the ability to scale it back down.
  • But this lack of priority is already present: by default, game server pods are basically priority 0, like most workloads. My PR basically just prevents a hypothetical case where something with priority preempts the game server pod. (But because there's no convention around priority, it's hard to prognosticate exactly what priority would even prevent this.)

But then I realized the important piece:

So crucially, pod preemption is weak protection for the case already covered by taints/tolerations. IMO, we don't need to bother with setting a priority class on game server pods. I'm closing this as wontfix, but happy to hear arguments that it is maybe useful. Also note that Agones allows you to configure the priority class manually - you can just create a priority class manually and point the game server pod template at it.

I need to do a little thinking about this in the context of #2777 and ensure we have reasonable workload separation on Autopilot, though. I'll open a separate issue for that.

@zmerlynn zmerlynn closed this as not planned Won't fix, can't repro, duplicate, stale Dec 1, 2022
@zmerlynn zmerlynn added the wontfix Sorry, but we're not going to do that. label Dec 1, 2022
@zmerlynn
Copy link
Collaborator Author

zmerlynn commented Dec 5, 2022

#2840 is the successor thought to this (just leaving a breadcrumb).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New features for Agones wontfix Sorry, but we're not going to do that.
Projects
None yet
Development

No branches or pull requests

2 participants