Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use batching in GameServerAllocation controller to improve throughput. #536

Closed
jkowalski opened this issue Jan 30, 2019 · 13 comments
Closed
Assignees
Labels
area/performance Anything to do with Agones being slow, or making it go faster. kind/design Proposal discussing new features / fixes and how they should be implemented
Milestone

Comments

@jkowalski
Copy link
Contributor

To get better throughput in GSA controller we could do batching: group together N allocation requests, assign GS to each of them and individually commit in parallel.

@jkowalski jkowalski added the area/performance Anything to do with Agones being slow, or making it go faster. label Jan 30, 2019
@markmandel
Copy link
Member

markmandel commented Jan 30, 2019

As a first step, I'm going to look into moving GameServerAllocation into a Aggregate API so we can have more control over what happens with the API.

In theory, there should be no/minimal change to how GSA's work -- at least that's the plan 😄

@markmandel markmandel self-assigned this Feb 3, 2019
@markmandel markmandel added the kind/design Proposal discussing new features / fixes and how they should be implemented label Feb 16, 2019
@markmandel
Copy link
Member

The good news is - I have (some cleanup to do) an API extension working to do gameserver allocations.

I've only implemented and supported the CREATE (HTTP: Post) method on the API, as without storage, it's really the only one needed. If people request it, I could look into the Watch function as well, if people want to watch for create events.

The annoying news is - each create API call has 60s to provide a response (although can keep processing in the background) -- which makes one of the long term goals, having a SDK.Ack() function for blocking on Allocation return) -- a little trickier. Or at least, with a shorter timeout than I may have liked.

Asking the community for feedback on that aspect (slack):

I'm wondering if a 30 second timeout (to give everything else some buffer) for that ack() to come back is reasonable or not? I can't imaging you would want to keep your players waiting that long anyway -- or is that too short a time? (could probably bump that out to 40 or 50 second if need be, but 30 is super conservative)

(Basically, we'd wait for 30 second, and if we didn't get the ack back, we'd either delete the gameserver, or maybe kick it back to ready - haven't decided. Probably delete -- it's cleaner)

Regardless, this will now also allow us to batch, skip storage for the GSA, etc. And also make it easier if we decide to also provide a gRPC interface as well for allocation.

@markmandel markmandel added this to the 0.9.0 milestone Feb 18, 2019
markmandel added a commit to markmandel/agones that referenced this issue Feb 19, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can
       continue processing (60s gives us enough time, I think for a SDK.Ack() on Allocate, which I don't think
       we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.
1. Sets us up if we decide we also want to have an alternative (http and/or gRPC) endpoint for allocation, based
   on feedback from this implementation.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either.

This also includes some libraries for building further api server extension points.
@ilkercelikyilmaz
Copy link
Contributor

What is the number of per second allocations do we need to reach? I am running some load tests against the recent allocation changes I did (haven't been checked-in yet). I run 60 concurrent clients and the system can allocate around 60 gs per sec (Allocated 2999 gs in 50 secs).

image

markmandel added a commit to markmandel/agones that referenced this issue Feb 19, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can
       continue processing (60s gives us enough time, I think for a SDK.Ack() on Allocate, which I don't think
       we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.
1. Sets us up if we decide we also want to have an alternative (http and/or gRPC) endpoint for allocation, based
   on feedback from this implementation.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.

This also includes some libraries for building further api server extension points.
@markmandel
Copy link
Member

markmandel commented Feb 19, 2019

@ilkercelikyilmaz that's a huge improvement over what we had previously 🔥 ( @pm7h do you have those numbers on hand? I can't seem to find them). Once we also incorporate #600 - I wonder if we might be very close to what we might need to be.

@pm7h
Copy link
Contributor

pm7h commented Feb 20, 2019

Yes, it's a huge improvement. Last I ran my load tests, it took over a minute for 100 allocations. You can see those results here: #412 (comment)

@ilkercelikyilmaz
Copy link
Contributor

I made couple changes after talking to Jarek (use Update instead of patch to prevent multiple allocations) and random gs selection from the top N (=20) available list to reduce the number of collisions
.
I also changes the test client to increase the QPS of the kubernetes client.
With these changed I ran the test few times. With 16 concurrent client, system can allocate around 100 gs per sec (2800 gs allocated in 28 seconds).

image

markmandel added a commit to markmandel/agones that referenced this issue Feb 21, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can
       continue processing (60s gives us enough time, I think for a SDK.Ack() on Allocate, which I don't think
       we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.
1. Sets us up if we decide we also want to have an alternative (http and/or gRPC) endpoint for allocation, based
   on feedback from this implementation.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.

This also includes some libraries for building further api server extension points.
markmandel added a commit to markmandel/agones that referenced this issue Feb 21, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can
       continue processing (60s gives us enough time, I think for a SDK.Ack() on Allocate, which I don't think
       we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.
1. Sets us up if we decide we also want to have an alternative (http and/or gRPC) endpoint for allocation, based
   on feedback from this implementation.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.

This also includes some libraries for building further api server extension points.
markmandel added a commit to markmandel/agones that referenced this issue Feb 25, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can
       continue processing (60s gives us enough time, I think for a SDK.Ack() on Allocate, which I don't think
       we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.
1. Sets us up if we decide we also want to have an alternative (http and/or gRPC) endpoint for allocation, based
   on feedback from this implementation.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.

This also includes some libraries for building further api server extension points.
markmandel added a commit to markmandel/agones that referenced this issue Feb 25, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can
       continue processing (60s gives us enough time, I think for a SDK.Ack() on Allocate, which I don't think
       we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.
1. Sets us up if we decide we also want to have an alternative (http and/or gRPC) endpoint for allocation, based
   on feedback from this implementation.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.

This also includes some libraries for building further api server extension points.
markmandel added a commit to markmandel/agones that referenced this issue Feb 27, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can
       continue processing (60s gives us enough time, I think for a SDK.Ack() on Allocate, which I don't think
       we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.
1. Sets us up if we decide we also want to have an alternative (http and/or gRPC) endpoint for allocation, based
   on feedback from this implementation.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.

This also includes some libraries for building further api server extension points.
markmandel added a commit to markmandel/agones that referenced this issue Mar 8, 2019
This both cleans up the webhook component, and makes it easier to test,
but also sets us up to reuse the https server with the given cert pair --
which we will want to do as we work on googleforgames#536 and setup an api server extension
which needs exactly the same self signed certificate setup.
jkowalski pushed a commit that referenced this issue Mar 8, 2019
This both cleans up the webhook component, and makes it easier to test,
but also sets us up to reuse the https server with the given cert pair --
which we will want to do as we work on #536 and setup an api server extension
which needs exactly the same self signed certificate setup.
@markmandel markmandel modified the milestones: 0.9.0, 0.10.0 Mar 26, 2019
markmandel added a commit to markmandel/agones that referenced this issue Mar 30, 2019
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Mar 30, 2019
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Mar 30, 2019
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Mar 30, 2019
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Mar 31, 2019
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Mar 31, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Mar 31, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Apr 3, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Apr 4, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Apr 13, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Apr 16, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Apr 16, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit to markmandel/agones that referenced this issue Apr 16, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
markmandel added a commit that referenced this issue Apr 16, 2019
This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
@markmandel
Copy link
Member

/cc @ilkercelikyilmaz @jkowalski how do we feel about closing this issue, given the performance we have now?

@markmandel markmandel modified the milestones: 0.10.0, 0.11.0 May 7, 2019
@ilkercelikyilmaz
Copy link
Contributor

I think this can be a good improvement but there is no urgency so we should keep it open. Not a blovker for 1.X though.
@markmandel , did you see my comment/findings on the recent change on PodList improvement?

@markmandel markmandel removed this from the 0.11.0 milestone May 8, 2019
@markmandel
Copy link
Member

I think this can be a good improvement but there is no urgency so we should keep it open. Not a blovker for 1.X though.

Good call 👍 I've moved it off the next milestone, but leaving it open.

@markmandel , did you see my comment/findings on the recent change on PodList improvement?

I did but hard to determine why that is happening - would be useful to have the performance testing suite in open source in some way, so we can all test things. Might be good to do CPU flame graph to see where the bottlenecks are.

@ilkercelikyilmaz
Copy link
Contributor

I ill try to check-in my load test in 0.11.

@markmandel
Copy link
Member

I think this can be closed now! if you have objections, please say so, otehrwise I will close on Tuesday!

@markmandel
Copy link
Member

No response! Closing! 😄

@markmandel markmandel added this to the 0.11.0 milestone Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Anything to do with Agones being slow, or making it go faster. kind/design Proposal discussing new features / fixes and how they should be implemented
Projects
None yet
Development

No branches or pull requests

4 participants