-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Multi-Cluster Allocation Policies #597
Comments
This looks pretty good to me, I'm not seeing any glaring red flags, and looks to solve the HA problem as well. - only question I have - what if I don't want to have a probability of going to another cluster - say I want to go allocate against "on-prem" until it fills up, and then move to "cloud-1", do I set the weight to 1? 0? All the same weights? (does weight have a default?) @Kuqd I know you've been working with multiple clusters a lot -- what are your thoughts? |
This is controlled by the Priority. If there is only one entry with Priority=1 then there is no probability and the allocation will happen only on this cluster (until it is out of capacity). If there are multiple entries with same priority, then it will use weight of each entry to distribute allocation between clusters. |
Actually, I see this is also covered in @jkowalski 's example. That makes sense to me. |
I think it would be nice to describe what kind of credentials we are supporting, my guess is service account token. One con of 3 is security but that’s the price for no single point of failure. Would a regional cluster with option 2 be good enough to remove the single point of failure? Would be nice to be able to target a cluster from the GSA, so matchmaking can make some ping requests and select the right cluster, WDYT? Basically being able to say I would prefer us-east but if it’s full follow the policy. Seems already possible if you create a bunch of policy so that’s good. |
So I've assumed the credentials are Kubernetes credentials (bearer token)? So essentially a service account + rbac permissions (although you are right - we should be explicit about this). Since you can create your own topology with this design - the tradeoffs are up to the user. If you feel that 3 GSA router clusters is enough HA, then that's fine. But if you want more than that, you can add more. In fact -- you can adjust on the fly. Originally I had thought a director/agent model would be better -- but looking at this, I think this is better because:
We will want to have some pretty explicit docs on how to create and manage these tokens through - or point to some documentation that does this (I haven't seen anything yet on my travels).
I think this is covered by the policies. I think the idea here is that the game server ops team can decide on the policy set that is in place - and the team working on matchmaking / game logic isn't able to accidentally override that when attempting to get a game server. They can only choose from a pre-approved set. |
We should also point people at: https://kubernetes.io/docs/tasks/administer-cluster/kms-provider/ To ensure they know to encrypt secrets at rest. Do we need to do more security wise here? |
What do you think of changing GameServerAllocationPolicy to support time, while policies are ordered chronologically? For example for t1<---policy1----->t2<----policy2----... The CRD for GameServerAllocationPolicy will be something like:
|
So Agones will select the policies the ones that have the start time closest to the current time? |
@pooneh-m - I'm wondering if the motivation for this is to be able to automatically declare the start and end times of a policy, so that a user doesn't have to do this manually? (Then it becomes a question of how to handle crossover periods). Is that correct? |
@ilkercelikyilmaz Yes. Agones will pick the policy that its current time is passed the start time and before the next start time. |
What do you think of naming allocation policy CR?
|
I think I'm leaning more towards: WDYT? |
Another potentially fun question - should the (And maybe stable => core?) |
I am more leaning towards keeping GameServer prefix for Agones CRs because there is less risk of having the same CRD name for two CRs in the same cluster. I think either GameServerAllocationPolicy or GameServerMultiClusterAllocationPolicy is fine. GameServerMultiClusterAllocationPolicy has two votes. |
Lets discuss this in issue #703 that you opened. |
Yeah I agree - we should let grouping dictate naming - and 100% agreed on |
Based on the group naming suggestion in #703, I am choosing GameServerAllocationPolicy, since the full name <plural>.<group> has multicluster in it |
I'll be adding a new field to GameServerAllocation to extend it for multicluster allocation.
By default the multicluster policy will not be effective for allocation. If MultiClusterPolicySelector is specified, multicluster policy is enforced per request. There are two benefits to it:
We can also make enabling and disabling the multicluster policy by introducing an explicit flag, but I don't think it is necessary.
|
Just so I'm 100% clear, |
A list of multicluster policies are selected using |
Oh neat - so it's more of a merge operation really - all the |
About the cluster to cluster connectivity, apparently, service accounts are not forever. One way to solve the connectivity is to introduce allocation as a service that can call other cluster's allocation services directly instead of through API servers using pre-installed certificates. WDYT? |
Not sure I understand the above tbh. Sounds like a re-architecting of how the kubernetes API is authenticated (if I read it correctly)? How does that impact connectivity? If kubectl can be used from outside a cluster, we should be able to do the same thing, no? (it all uses client-go, after all) |
Yes, we need a slight re-architecture to handle authentication for cluster to cluster allocation requests. For match making service calling to an allocation service from a different cluster or for multi-cluster allocation scenario, we cannot store a service account token and assume it lasts forever. We cannot also assume customers can enable a plugins for authenticating with an identity provider e.g. OIDC on their cluster or accept a client or TLS cert. The solution is to (1) introduce a reverse proxy with external IP on the cluster that performs the authentication of allocation requests and then forward the requests to API server. For better performance (2) we should move the allocation service logic (controller) to the proxy and call it allocation service. Then (3) remove API server extension for allocation, which is a breaking change and should be done before 1.0 release. The solution will be similar to this sample. For talking to GKE, kubectl is using user account authenticated with google identity provider, instead of service account and it has expiry on the token. |
I have a strange emotional attachment to I still like the idea of keeping them around, also because it's a nice in-cluster and/or developer experience -- but I totally get the reasoning of potentially removing them. Maybe I'm being overly sentimental? (I can admit to that). We had a short discussion earlier about completing (1) above, and then seeing how our performance goes? Is that our first step, and then maybe I can live in hope that they may stay around? 😄 But yes - I 100% agree we need to make a decision before 1.0, as it affects the API surface, and we need to lock that down before 1.0. |
@pooneh-m - just running a smoke test on the latest RC, noticed in the logs: {"message":"agones.dev/agones/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Secret: secrets is forbidden: User \"system:serviceaccount:agones-system:agones-controller\" cannot list secrets at the cluster scope","severity":"error","time":"2019-05-08T18:59:53.210002337Z"}
{"message":"agones.dev/agones/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Secret: secrets is forbidden: User \"system:serviceaccount:agones-system:agones-controller\" cannot list secrets at the cluster scope","severity":"error","time":"2019-05-08T18:59:54.212493604Z"}
{"message":"agones.dev/agones/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Secret: secrets is forbidden: User \"system:serviceaccount:agones-system:agones-controller\" cannot list secrets at the cluster scope","severity":"error","time":"2019-05-08T18:59:55.214727668Z"} Looks like we need to add a RBAC permission 😢 Doesn't affect functionality at the moment, but would be good to get a fix in while we're in feature freeze I think. |
Thanks! I am on it. |
Here are the remaining work items for the allocator service:
|
@pooneh-m just wanted to gently nudge this - see where we up to date on this? We should probably add "documentation" to the above list as well 😃 This isn't on the 1.0 roadmap, but I was just curious. |
Yes, I am going to tackle the list before going to v1.0. I added documentation to the list. |
Nice! Very cool! |
@pooneh-m Hi! Any update on this? |
Hi @Davidnovarro, I just started working on this again. Hopefully before v1.0 there will be plenty of updates. :) I'm planning to do a refactoring to move allocation handler to its own stand alone library that both allocator service and the API server extension reference to help with the scale. Then I will introduce the gRPC API, add more testing for cross cluster calls and add documentations. |
@pooneh-m is this closeable now? |
Background
When operating multiple Agones clusters to support world-wide game launch it is often necessary to perform multi-cluster allocations (allocations from a set of clusters instead of just one) based on defined policies.
Examples of common policies include:
Proposal
This document proposes changing how
GameServerAllocation
API works by adding support for forwarding allocation requests to other clusters based on policies that can be applied on a per-request basis.We will add a new CRD called
GameServerAllocationPolicy
that controls how multi-cluster allocations will be performed. The policy will contain a list of clusters to allocate from with corresponding priorities and weights. The credentials for accessing those clusters will be stored in secrets.The name of the policy can be specified when creating
GameServerAllocation
.If
policy
is not specified, the allocation will be attempted from the local cluster as it is done today.When
policy
is present on a theGameServerAllocation
request, the API handler would become a router that calls the specified clusters in their priority order, and for clusters with equal priority it would randomly pick a clusters, with probability of choosing a cluster proportional to its weight. If a cluster is out of capacity, the handler would try other clusters until the allocation succeeds or the list of clusters to try is exhausted.In the example above, when the allocation request comes in, we would always try allocating from
on-prem
cluster first, because it's a cluster with highest priority.If allocation from
on-prem
fails, we proceed to next highest priority which includes two possible clusters:this-cluster
(with 75% probability) orother-cloud
(with 25% probability).Deployment Topologies
In a multi-cluster scenario, several allocation topologies are possible, based on a decision where to put the
GameServerAllocationPolicy
objects:Single Cluster Responsible For Routing
In this mode, a single cluster is selected to server multi-cluster allocation APIs and Match Maker is pointed at its allocation endpoint. The cluster has
GameServerAllocationPolicy
that points at all other clusters. This has the benefit of simplicity, but has a single point of failure, which is the chosen cluster.Pros:
Cons:
Dedicated Cluster Responsible For Routing
Another option, similar to the single cluster is to create a dedicated cluster that's only responsible for allocations, but does not host game servers or fleets. This cluster will have only routing policies and secrets to talk to other clusters.
Pros:
Cons:
All Clusters Responsible For Routing
In this mode, all clusters will have policies and secrets that allow them to route allocation requests to all other clusters when necessary. A global load balancer (could be a VIP or DNS-based) will randomly pick a cluster to allocate from, which will perform an extra "hop" to the cluster based on policy.
Pros:
Cons:
Other Topologies
Other, more complex topologies are possible, including hierarchical ones where routing-only clusters form a hierarchy, routing-only-clusters behind load-balancers, etc.
The text was updated successfully, but these errors were encountered: