-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Admission webhook "mutations.stable.agones.dev" errors with Invalid FleetAutoscaler #406
Comments
Kubernetes is a declarative system. Users provide manifests that define desired state and Kubernetes converges the current state accordingly. If the workloads in question don't support this model because they require specific ordering, that means the workloads do not support a declarative model. In this case, the focus should be on fixing the workloads to support orderless installation via logic inside the pods. |
On my phone atm, so can't go too deep, but check the Troubleshooting guide for a review of logs: https://github.com/GoogleCloudPlatform/agones/blob/master/docs/troubleshooting.md Also, what happens when you debug output your Helm chart - does it look correct? |
Helm chart work fine when create fleet before autoscaler 👍 |
This was by initial design, and intended behaviour. It means that Fleets and Fleet Autoscalers get garbage collected appropriately when the Fleet gets deleted. @victor-prodan Anything you want to add here (since you wrote this)? Should we decouple them, and allow Fleet autoscalers to be orphaned? @Oleksii-Terekhov has some valid points about being able to define these declaratively. It puts more onus on the developer to cleanup these artifacts though. The other issue here is that if we maintain the current functionality should have a validationHandler for the checking of the fleetName value, as that will return a much more readable validation error, with a reason - rather than "500", with no message (which is what happens with a mutation handler that rejects the call). Also hunting down a workaround, at least for the moment. |
I think you can use lifecycle hooks to solve this problem for the moment (would love to hear if it works for you) - but this does sound like an issue worth discussion - do we want to force developers down this path? |
Maybe autoscaler must be part of fleet definitions? Also question about minReplicas - why you need this? I cannot realize when minReplicas <>0 in production environment. Value in bufferSize resolve all issues.... Helm hooks (https://github.com/helm/helm/blob/master/docs/charts_hooks.md)looks weird due:
|
Aah. that's a good point re: hooks not being managed. Okay, that's a deal breaker. Original fleet autoscaling design ticket for context. |
Quick thought on workaround: Do you have the fleet + fleetautoscaler definitions in the same file, or separate files in your helm chart? Wondering if they are in the same template file, it will solve the issue for now. |
In the separate files... I'll try join they now. Update for @markmandel: "helm delete --purge" + "helm install" with joined template still fails with same error "Invalid FleetAutoscaler" |
Here is an example: you need to restart the matchmaker servers so you know there will be a temporary down spike in demand, but the requests will come back as usual after a short while. In this case, you want to keep your dedicated server cluster warmed up to the current size and ready to handle the load. To do this, just set minReplicas to your current level before restarting the matchmaker and remove it after the load has resumed. |
I don't know, I can see it going both ways. |
In design ticket exist point:
|
I'm digging into the Figure following their guide here probably makes the most amount of sense. |
I would definitely go with what the HPA does but my instinct says we should keep them locked together. Let's see if my instinct is right haha. |
Checking HPA - yes, you can create one without having a Deployment. What happens is that you get a warning event on the HPA if it can't find the Deployment - so that the issue user visible. I propose we should go down this path (allow a FleetAutoscaler be created without a Fleet) - it does allow for nicer decoupling, and still lets a user see why a FleetAutoscaler may not be working. Small question - should this mean you can change which Fleet a FleetAutoscaler is pointed at? (I'm leaning towards yes, just because I can't think of a downside, and it may be a nice bit of flexibility). Thoughts? |
Is there a chance for this in release 0.6.0? |
Working on it right now! Probably have a PR today. |
This removes some previous restrictions with the FleetAutoscaler, as well as cleaning up a couple of items. - FleetAutoscalers can now have a Fleet target that doesn't exit. - If the target doesn't exist, this will be recorded as a warning event of type `FailedGetFleet`, rather than `ScalingLimited` status item, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. - FleetAutoscalers can now have their Fleet target edited. This means that FleetAutoscalers can be transitioned from one Fleet to another. This may be useful in scenarios like red-green deployments, or if you wish to temporarily disable an Autoscaler, but not delete it. - `FleetAutoscaler.Status.ScalingLimited` was also replaced by a `ScalingLimited` Event, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. Closes googleforgames#406
PR #416 in place -- as I said on the PR, I made some decisions that I think make sense, but also consider this a sacrificial draft, so definitely let me know if you see issues. |
This removes some previous restrictions with the FleetAutoscaler, as well as cleaning up a couple of items. - FleetAutoscalers can now have a Fleet target that doesn't exit. - If the target doesn't exist, this will be recorded as a warning event of type `FailedGetFleet`, rather than `ScalingLimited` status item, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. - FleetAutoscalers can now have their Fleet target edited. This means that FleetAutoscalers can be transitioned from one Fleet to another. This may be useful in scenarios like red-green deployments, or if you wish to temporarily disable an Autoscaler, but not delete it. - `FleetAutoscaler.Status.ScalingLimited` was also replaced by a `ScalingLimited` Event, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. Closes googleforgames#406
This removes some previous restrictions with the FleetAutoscaler, as well as cleaning up a couple of items. - FleetAutoscalers can now have a Fleet target that doesn't exit. - If the target doesn't exist, this will be recorded as a warning event of type `FailedGetFleet`, rather than `ScalingLimited` status item, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. - FleetAutoscalers can now have their Fleet target edited. This means that FleetAutoscalers can be transitioned from one Fleet to another. This may be useful in scenarios like red-green deployments, or if you wish to temporarily disable an Autoscaler, but not delete it. - `FleetAutoscaler.Status.ScalingLimited` was also replaced by a `ScalingLimited` Event, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. Closes googleforgames#406
This removes some previous restrictions with the FleetAutoscaler, as well as cleaning up a couple of items. - FleetAutoscalers can now have a Fleet target that doesn't exit. - If the target doesn't exist, this will be recorded as a warning event of type `FailedGetFleet`, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. - FleetAutoscalers can now have their Fleet target edited. This means that FleetAutoscalers can be transitioned from one Fleet to another. This may be useful in scenarios like red-green deployments, or if you wish to temporarily disable an Autoscaler, but not delete it. - `FleetAutoscaler.Status.ScalingLimited` has an added Event, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. Closes googleforgames#406
This removes some previous restrictions with the FleetAutoscaler, as well as cleaning up a couple of items. - FleetAutoscalers can now have a Fleet target that doesn't exit. - If the target doesn't exist, this will be recorded as a warning event of type `FailedGetFleet`, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. - FleetAutoscalers can now have their Fleet target edited. This means that FleetAutoscalers can be transitioned from one Fleet to another. This may be useful in scenarios like red-green deployments, or if you wish to temporarily disable an Autoscaler, but not delete it. - `FleetAutoscaler.Status.ScalingLimited` has an added Event, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. Closes googleforgames#406
This removes some previous restrictions with the FleetAutoscaler, as well as cleaning up a couple of items. - FleetAutoscalers can now have a Fleet target that doesn't exit. - If the target doesn't exist, this will be recorded as a warning event of type `FailedGetFleet`, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. - FleetAutoscalers can now have their Fleet target edited. This means that FleetAutoscalers can be transitioned from one Fleet to another. This may be useful in scenarios like red-green deployments, or if you wish to temporarily disable an Autoscaler, but not delete it. - `FleetAutoscaler.Status.ScalingLimited` has an added Event, as it allows a human readable message, as well as give us data on how often this occurs, and it is readable by stat/alerting packages. Closes #406
Bare-metal k8s v1.10.4, helm v2.9.1, Agones (installed by helm) v0.5.0,
Linux 4.4.159-1.el7.elrepo.x86_64, Centos 7.4
Custom helm chart for my deployment with fleet and fleetAutoscaler
At helm install almost always successful, but sometimes(1:10):
Same error in kube-apiserver log:
As background - 20-40 fleets with autoscalers in one namespace with frequent parallel redeploy (install/upgrade/delete via helm)
Which log can help resolve this critical issue?
Almost always sequential ./helm upgrade --force --install --wait --timeout 1200 ... resolve issue.
Maybe race condition between fleet and correspond autoscaler creation/update?
Update 1: I try apply autoscaler without fleet - got same error
Update 2: helm/helm#2994 - common CRD issue...
The text was updated successfully, but these errors were encountered: