Admission webhook "mutations.stable.agones.dev" errors with Invalid FleetAutoscaler #406

Oleksii-Terekhov · 2018-11-07T12:59:35Z

Bare-metal k8s v1.10.4, helm v2.9.1, Agones (installed by helm) v0.5.0,
Linux 4.4.159-1.el7.elrepo.x86_64, Centos 7.4

Custom helm chart for my deployment with fleet and fleetAutoscaler

  policy:
    type: Buffer
    buffer:
      bufferSize: 2
      minReplicas: 0
      maxReplicas: 10

At helm install almost always successful, but sometimes(1:10):

Release "xxx" does not exist. Installing it now.
Error: release xxx failed: Internal error occurred: admission webhook 
  "mutations.stable.agones.dev" denied the request: Invalid FleetAutoscaler

Same error in kube-apiserver log:

net/http.HandlerFunc.ServeHTTP(0xc42a10c2e0, 0x7f4ba00d9160, 0xc4216cf320, 0xc44a31e200)
        /usr/local/go/src/net/http/server.go:1918 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc42a10c380, 0xb1e9820, 0xc4216cf320, 0xc44a31e200, 0xc43dd541e0)
        /workspace/anago-v1.10.4-beta.0.68+5ca598b4ba5abb/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:93 +0x8d
created by k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
        /workspace/anago-v1.10.4-beta.0.68+5ca598b4ba5abb/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:92 +0x1ab

logging error output: "{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"Internal error occurred: admission webhook \\\"mutations.stable.agones.dev\\\" denied the request: Invalid FleetAutoscaler\",\"reason\":\"InternalError\",\"details\":{\"causes\":[{\"message\":\"admission webhook \\\"mutations.stable.agones.dev\\\" denied the request: Invalid FleetAutoscaler\"}]},\"code\":500}\n"
 [[tiller/v0.0.0 (linux/amd64) kubernetes/$Format] 10.10.20.31:34754]

As background - 20-40 fleets with autoscalers in one namespace with frequent parallel redeploy (install/upgrade/delete via helm)

Which log can help resolve this critical issue?
Almost always sequential ./helm upgrade --force --install --wait --timeout 1200 ... resolve issue.

Maybe race condition between fleet and correspond autoscaler creation/update?

Update 1: I try apply autoscaler without fleet - got same error

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/agones/master/examples/simple-udp/fleetautoscaler.yaml

Error from server (InternalError): error when creating "https://raw.githubusercontent.com/GoogleCloudPlatform/agones/master/examples/simple-udp/fleetautoscaler.yaml": Internal error occurred: admission webhook "mutations.stable.agones.dev" denied the request: Invalid FleetAutoscaler

Update 2: helm/helm#2994 - common CRD issue...

The text was updated successfully, but these errors were encountered:

Oleksii-Terekhov · 2018-11-07T15:13:36Z

Kubernetes is a declarative system. Users provide manifests that define desired state and Kubernetes converges the current state accordingly. If the workloads in question don't support this model because they require specific ordering, that means the workloads do not support a declarative model. In this case, the focus should be on fixing the workloads to support orderless installation via logic inside the pods.
(c) https://github.com/gabrtv

markmandel · 2018-11-07T15:33:19Z

On my phone atm, so can't go too deep, but check the Troubleshooting guide for a review of logs:

https://github.com/GoogleCloudPlatform/agones/blob/master/docs/troubleshooting.md

Also, what happens when you debug output your Helm chart - does it look correct?

Oleksii-Terekhov · 2018-11-07T16:21:49Z

Helm chart work fine when create fleet before autoscaler 👍
When helm create autoscaler before fleet - install blow up 👎

markmandel · 2018-11-07T16:43:10Z

This was by initial design, and intended behaviour. It means that Fleets and Fleet Autoscalers get garbage collected appropriately when the Fleet gets deleted.

@victor-prodan Anything you want to add here (since you wrote this)? Should we decouple them, and allow Fleet autoscalers to be orphaned? @Oleksii-Terekhov has some valid points about being able to define these declaratively. It puts more onus on the developer to cleanup these artifacts though.

The other issue here is that if we maintain the current functionality should have a validationHandler for the checking of the fleetName value, as that will return a much more readable validation error, with a reason - rather than "500", with no message (which is what happens with a mutation handler that rejects the call).

Also hunting down a workaround, at least for the moment.

markmandel · 2018-11-07T17:01:13Z

I think you can use lifecycle hooks to solve this problem for the moment (would love to hear if it works for you) - but this does sound like an issue worth discussion - do we want to force developers down this path?

https://docs.helm.sh/developing_charts/#hooks

Oleksii-Terekhov · 2018-11-07T17:07:35Z

Maybe autoscaler must be part of fleet definitions?

Also question about minReplicas - why you need this? I cannot realize when minReplicas <>0 in production environment. Value in bufferSize resolve all issues....

Helm hooks (https://github.com/helm/helm/blob/master/docs/charts_hooks.md)looks weird due:

Hook resources are not managed with corresponding releases

The resources that a hook creates are not tracked or managed as part of the release. Once Tiller verifies >that the hook has reached its ready state, it will leave the hook resource alone.

Practically speaking, this means that if you create resources in a hook, you cannot rely upon helm delete >to remove the resources. To destroy such resources, you need to either write code to perform this >operation in a pre-delete or post-delete hook or add "helm.sh/hook-delete-policy" annotation to the >hook template file.

markmandel · 2018-11-07T17:12:28Z

Aah. that's a good point re: hooks not being managed. Okay, that's a deal breaker.

Original fleet autoscaling design ticket for context.

markmandel · 2018-11-07T17:34:34Z

Quick thought on workaround: Do you have the fleet + fleetautoscaler definitions in the same file, or separate files in your helm chart?

Wondering if they are in the same template file, it will solve the issue for now.

Oleksii-Terekhov · 2018-11-08T08:11:44Z

In the separate files... I'll try join they now.
But original helm flow independent of file-level granularity, manifest kind+hooks may change order.

Update for @markmandel: "helm delete --purge" + "helm install" with joined template still fails with same error "Invalid FleetAutoscaler"

victor-prodan · 2018-11-08T12:28:08Z

Also question about minReplicas - why you need this? I cannot realize when minReplicas <>0 in production environment. Value in bufferSize resolve all issues....

Here is an example: you need to restart the matchmaker servers so you know there will be a temporary down spike in demand, but the requests will come back as usual after a short while. In this case, you want to keep your dedicated server cluster warmed up to the current size and ready to handle the load.

To do this, just set minReplicas to your current level before restarting the matchmaker and remove it after the load has resumed.

victor-prodan · 2018-11-08T12:30:16Z

@victor-prodan Anything you want to add here (since you wrote this)? Should we decouple them, and allow Fleet autoscalers to be orphaned?

I don't know, I can see it going both ways.
Maybe @EricFortin has a stronger opinion on this?

Oleksii-Terekhov · 2018-11-08T14:54:35Z

In design ticket exist point:

autoscaling settings can be part of the Fleet specification, it would make sense; each fleet must have a different autoscaling settins

markmandel · 2018-11-09T17:01:44Z

I'm digging into the HorizontalPodAutoscaler here - see what it does when a Deployment doesn't exist. I can't find any docs, and digging around for the code. I may just have to setup an example and try it! 😄

Figure following their guide here probably makes the most amount of sense.

EricFortin · 2018-11-12T11:53:51Z

I would definitely go with what the HPA does but my instinct says we should keep them locked together. Let's see if my instinct is right haha.

markmandel · 2018-11-13T20:21:58Z

Checking HPA - yes, you can create one without having a Deployment.

What happens is that you get a warning event on the HPA if it can't find the Deployment - so that the issue user visible.

I propose we should go down this path (allow a FleetAutoscaler be created without a Fleet) - it does allow for nicer decoupling, and still lets a user see why a FleetAutoscaler may not be working.

Small question - should this mean you can change which Fleet a FleetAutoscaler is pointed at? (I'm leaning towards yes, just because I can't think of a downside, and it may be a nice bit of flexibility).

Thoughts?

Oleksii-Terekhov · 2018-11-14T10:05:55Z

Is there a chance for this in release 0.6.0?

markmandel · 2018-11-14T15:13:34Z

Working on it right now! Probably have a PR today.

This removes some previous restrictions with the FleetAutoscaler, as well as cleaning up a couple of items. - FleetAutoscalers can now have a Fleet target that doesn't exit. - If the target doesn't exist, this will be recorded as a warning event of type `FailedGetFleet`, rather than `ScalingLimited` status item, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. - FleetAutoscalers can now have their Fleet target edited. This means that FleetAutoscalers can be transitioned from one Fleet to another. This may be useful in scenarios like red-green deployments, or if you wish to temporarily disable an Autoscaler, but not delete it. - `FleetAutoscaler.Status.ScalingLimited` was also replaced by a `ScalingLimited` Event, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. Closes googleforgames#406

markmandel · 2018-11-15T00:15:15Z

PR #416 in place -- as I said on the PR, I made some decisions that I think make sense, but also consider this a sacrificial draft, so definitely let me know if you see issues.

This removes some previous restrictions with the FleetAutoscaler, as well as cleaning up a couple of items. - FleetAutoscalers can now have a Fleet target that doesn't exit. - If the target doesn't exist, this will be recorded as a warning event of type `FailedGetFleet`, rather than `ScalingLimited` status item, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. - FleetAutoscalers can now have their Fleet target edited. This means that FleetAutoscalers can be transitioned from one Fleet to another. This may be useful in scenarios like red-green deployments, or if you wish to temporarily disable an Autoscaler, but not delete it. - `FleetAutoscaler.Status.ScalingLimited` was also replaced by a `ScalingLimited` Event, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. Closes googleforgames#406

This removes some previous restrictions with the FleetAutoscaler, as well as cleaning up a couple of items. - FleetAutoscalers can now have a Fleet target that doesn't exit. - If the target doesn't exist, this will be recorded as a warning event of type `FailedGetFleet`, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. - FleetAutoscalers can now have their Fleet target edited. This means that FleetAutoscalers can be transitioned from one Fleet to another. This may be useful in scenarios like red-green deployments, or if you wish to temporarily disable an Autoscaler, but not delete it. - `FleetAutoscaler.Status.ScalingLimited` has an added Event, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. Closes googleforgames#406

This removes some previous restrictions with the FleetAutoscaler, as well as cleaning up a couple of items. - FleetAutoscalers can now have a Fleet target that doesn't exit. - If the target doesn't exist, this will be recorded as a warning event of type `FailedGetFleet`, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. - FleetAutoscalers can now have their Fleet target edited. This means that FleetAutoscalers can be transitioned from one Fleet to another. This may be useful in scenarios like red-green deployments, or if you wish to temporarily disable an Autoscaler, but not delete it. - `FleetAutoscaler.Status.ScalingLimited` has an added Event, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. Closes #406

markmandel added the question I have a question! label Nov 7, 2018

markmandel added kind/bug These are bugs. kind/design Proposal discussing new features / fixes and how they should be implemented labels Nov 9, 2018

markmandel self-assigned this Nov 13, 2018

markmandel removed the question I have a question! label Nov 14, 2018

markmandel mentioned this issue Nov 15, 2018

FleetAutoscaler can be targeted at Non Existent Fleets #416

Merged

markmandel closed this as completed in #416 Nov 19, 2018

markmandel added this to the 0.6.0 milestone Nov 20, 2018

markmandel mentioned this issue Jul 24, 2019

Fleetautoscaler - one fleet can be controlled by two Fleetautoscalers simultaneously #939

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Admission webhook "mutations.stable.agones.dev" errors with Invalid FleetAutoscaler #406

Admission webhook "mutations.stable.agones.dev" errors with Invalid FleetAutoscaler #406

Oleksii-Terekhov commented Nov 7, 2018 •

edited

Loading

Oleksii-Terekhov commented Nov 7, 2018

markmandel commented Nov 7, 2018

Oleksii-Terekhov commented Nov 7, 2018

markmandel commented Nov 7, 2018

markmandel commented Nov 7, 2018

Oleksii-Terekhov commented Nov 7, 2018

markmandel commented Nov 7, 2018

markmandel commented Nov 7, 2018

Oleksii-Terekhov commented Nov 8, 2018 •

edited

Loading

victor-prodan commented Nov 8, 2018

victor-prodan commented Nov 8, 2018

Oleksii-Terekhov commented Nov 8, 2018

markmandel commented Nov 9, 2018

EricFortin commented Nov 12, 2018

markmandel commented Nov 13, 2018

Oleksii-Terekhov commented Nov 14, 2018

markmandel commented Nov 14, 2018

markmandel commented Nov 15, 2018

Admission webhook "mutations.stable.agones.dev" errors with Invalid FleetAutoscaler #406

Admission webhook "mutations.stable.agones.dev" errors with Invalid FleetAutoscaler #406

Comments

Oleksii-Terekhov commented Nov 7, 2018 • edited Loading

Oleksii-Terekhov commented Nov 7, 2018

markmandel commented Nov 7, 2018

Oleksii-Terekhov commented Nov 7, 2018

markmandel commented Nov 7, 2018

markmandel commented Nov 7, 2018

Oleksii-Terekhov commented Nov 7, 2018

markmandel commented Nov 7, 2018

markmandel commented Nov 7, 2018

Oleksii-Terekhov commented Nov 8, 2018 • edited Loading

victor-prodan commented Nov 8, 2018

victor-prodan commented Nov 8, 2018

Oleksii-Terekhov commented Nov 8, 2018

markmandel commented Nov 9, 2018

EricFortin commented Nov 12, 2018

markmandel commented Nov 13, 2018

Oleksii-Terekhov commented Nov 14, 2018

markmandel commented Nov 14, 2018

markmandel commented Nov 15, 2018

Oleksii-Terekhov commented Nov 7, 2018 •

edited

Loading

Oleksii-Terekhov commented Nov 8, 2018 •

edited

Loading