-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Scaler - Maintain last known state if prometheus is unavailable #965
Comments
We shouldn't probably limit this feature only to Prometheus scalers, but make it available to all scalers. |
Fair ask, but how do we deal with cases where the metric is higher than the threshold and Prometheus goes down. How do we avoid scaling loops on stale data? What if we keep track of:
But we remove the HPA and ensure the workload stays on the instance count of (3). That way it doesn't scale to 0 but you don't have scaling loops as well; otherwise you can flood clusters. |
I'm not sure I understand what you mean by scaling loop? If the last known value is above the threshold, you'll have x pods. The last known value will never change so your pods should be static. Example:
|
Well if the value is still 15 it will be bigger than 10 so KEDA would add an instance, and another one, and another one. |
I might be misunderstanding how the calculations work, but my understanding is the metric value would be |
Notes from standup yesterday:
@bryanhorstmann We just feed current & target metric to the HPA so it will keep on scaling. |
Thanks for the feedback @tomkerkhove. I'm glad this is being considered as a nice to have for |
@bryanhorstmann do you think this is something that you could contribute? |
Hi @zroubalik, I'm happy to have a go at it. Will need to do some digging and research as I've not dug too deeply into the code base. My understanding is that the only part that needs actual work is:
|
Yes, but it is unfortunately not that trivial. Currently we are checking and marking the status of a scaler in There should be settings as well, where users can specify the behaviour of this feature (timeout, enable/disable this feature, etc..) @bryanhorstmann and no pressure on you, if you don't feel confident enough to do such a big change :) @ahmelsayed FYI^ |
Hi @zroubalik, thank you. I think I'll step away from this one, but will be watching the progress. |
"Maintain last known state" - I think this approach has its drawbacks, especially when autoscaling to zero via I just hit this problem with I propose a new (optional) field
|
@VojtechVitek makes sense. Is it something you are willing to contribute? |
Is there any process for getting a proposal officially accepted? I'm thinking a better name for the field would be
Sorry, I don't think I have any free dev cycles in the next month or so. Perhaps later in the year. |
There isn't any official process. The best is to specify requirements in the issue and you can talk about it at the community call |
Hi @zroubalik @tomkerkhove. Is there any progress on this? Thanks. |
Not that I'm aware of. Are you interested in contributing this @lambohamp ? |
I had a go at this here mishamo@8aca785 It works fine when there are no active triggers. I don't want to PR this just yet, as there are a couple of issues that I noticed and would like to discuss them a little here (let me know if it's better to do so elsewhere).
|
So if I understand it correctly, once Prometheus comes back, the first getMetrics call returns 0 and not the correct metrics?
Generally if there are multiple triggers in SO, all metrics are fed into HPA and HPA then does the decision on the scaling (ie target replica count). The current HPA implementation chooses the greater value for the target replica count. So if one of the scalers is not available, it should do follow the same pattern, so KEDA should report for the unavailable trigger some fallback number. |
Yes, but I see that as a prometheus issue I guess. As the metrics we have defined are a
🤔 I'm either not following the suggestion or think it won't work very elegantly. The EDIT: Just had a thought; what if we added an additional metric to the HPA, which worked a little like the cron scaler using |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
For my 2 cents, I don't see how having Just seems like |
@bschaeffer you can use #1872 to mitigate this problem. |
This was already implemented in 2.8.1. |
* Add document for aws endpoint setting Signed-off-by: Phan Duc <[email protected]> * Fix docs style Signed-off-by: Phan Duc <[email protected]> Signed-off-by: Phan Duc <[email protected]>
I am in the process of migrating to Keda. I currently use https://github.com/DirectXMan12/k8s-prometheus-adapter and it has a very useful feature. In the event that Prometheus goes down, prom-adapter maintains the last known state of the metric. This means scaling is not triggered either up or down.
With Keda, if prometheus is not available, my deployments are scaled to zero after the
cooldownPeriod
has expired regardless of whether the last known value was above 0 or not.Use-Case
We are using prom adapter to scale google pubsub subscribers and rabbitmq workers. In the unlikely event that prometheus goes down we would want the existing workload to continue processing based on the numbers it knew before prometheus stopped responding.
The text was updated successfully, but these errors were encountered: