Prometheus Scaler - Maintain last known state if prometheus is unavailable #965

bryanhorstmann · 2020-08-03T09:56:51Z

I am in the process of migrating to Keda. I currently use https://github.com/DirectXMan12/k8s-prometheus-adapter and it has a very useful feature. In the event that Prometheus goes down, prom-adapter maintains the last known state of the metric. This means scaling is not triggered either up or down.

With Keda, if prometheus is not available, my deployments are scaled to zero after the cooldownPeriod has expired regardless of whether the last known value was above 0 or not.

Use-Case

We are using prom adapter to scale google pubsub subscribers and rabbitmq workers. In the unlikely event that prometheus goes down we would want the existing workload to continue processing based on the numbers it knew before prometheus stopped responding.

The text was updated successfully, but these errors were encountered:

zroubalik · 2020-08-03T12:10:20Z

We shouldn't probably limit this feature only to Prometheus scalers, but make it available to all scalers.
To make this work, we will need to store the last know metrics (probably in ScaledObject.Status) and use this one in case of scaler failure?

tomkerkhove · 2020-08-03T12:12:46Z

Fair ask, but how do we deal with cases where the metric is higher than the threshold and Prometheus goes down. How do we avoid scaling loops on stale data?

What if we keep track of:

When we were able to last connect
What the last known value was
What the instance count was

But we remove the HPA and ensure the workload stays on the instance count of (3). That way it doesn't scale to 0 but you don't have scaling loops as well; otherwise you can flood clusters.

bryanhorstmann · 2020-08-04T12:29:03Z

I'm not sure I understand what you mean by scaling loop? If the last known value is above the threshold, you'll have x pods. The last known value will never change so your pods should be static.

Example:

ScaledObject with threshold of 10
Current value is 15. You have 2 pods.
Prometheus goes down. Last known value is still 15.
Even if the queue that should be processed goes to 0 (or 100) the hpa still receives 15 from the metrics server
Pods remain on 2.

tomkerkhove · 2020-08-04T12:48:40Z

Well if the value is still 15 it will be bigger than 10 so KEDA would add an instance, and another one, and another one.

bryanhorstmann · 2020-08-04T13:00:17Z

I might be misunderstanding how the calculations work, but my understanding is the metric value would be 15, but as there are 2 pods, the average value would be 7.5. If the threshold is 10, then no new pods would be scaled in by the HPA.

tomkerkhove · 2020-08-07T04:51:46Z

Notes from standup yesterday:

Prometheus metric adapter will maintain the same instance count, while KEDA will scale back to 0 if it doesnt’ have the count

HPAs receiving errors from metric servers stop taking any action; this seems liek a safe aproach for us. Otherwise we can create autoscaling loops flooding whole cluster.

This is how it works today so we are good to go

We shouldn’t do 1 -> 0 nor 0 -> 1 scaling if we cannot fetch metrics

Would be nice for 2.0, otherwise afterwards

@bryanhorstmann We just feed current & target metric to the HPA so it will keep on scaling.

bryanhorstmann · 2020-08-07T05:13:15Z

Thanks for the feedback @tomkerkhove. I'm glad this is being considered as a nice to have for 2.0.

zroubalik · 2020-08-07T07:59:20Z

@bryanhorstmann do you think this is something that you could contribute?

bryanhorstmann · 2020-08-07T08:19:55Z

Hi @zroubalik,

I'm happy to have a go at it. Will need to do some digging and research as I've not dug too deeply into the code base.

My understanding is that the only part that needs actual work is:

We shouldn’t do 1 -> 0 nor 0 -> 1 scaling if we cannot fetch metrics

zroubalik · 2020-08-07T08:52:58Z

Yes, but it is unfortunately not that trivial. Currently we are checking and marking the status of a scaler in isActive property, in short: this is set to false if there are no metrics for scaling or if the scaler is unavailable. This behavior needs to be changed (and for consistency for all scalers). And then perform the 1 <-> 0 scaling based on this, currently it is being done based on isActive after the cooldownPeriod is over.

There should be settings as well, where users can specify the behaviour of this feature (timeout, enable/disable this feature, etc..)

@bryanhorstmann and no pressure on you, if you don't feel confident enough to do such a big change :)

@ahmelsayed FYI^

bryanhorstmann · 2020-08-13T12:08:35Z

Hi @zroubalik, thank you. I think I'll step away from this one, but will be watching the progress.

VojtechVitek · 2020-09-18T12:31:43Z

"Maintain last known state" - I think this approach has its drawbacks, especially when autoscaling to zero via minReplicaCount: 0. Imagine that you can't wake up your system, because the Keda Operator can't temporarily reach the source of metrics.

I just hit this problem with postgresql trigger. After a security group change in our AWS account, the Keda Operator suddenly couldn't reach our Postgres database and the whole system just scaled down to zero, making the service unavailable.

I propose a new (optional) field onErrorReplicaCount that would serve as a default value when Operator can't read current values, ie.:

apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: my-deployment-autoscaler
spec:
  scaleTargetRef:
    deploymentName: my-deployment
  pollingInterval: 1
  cooldownPeriod: 1200
  minReplicaCount: 0
  maxReplicaCount: 10
  onErrorReplicaCount: 2     # <==  2 pods, in case of a trigger source being unavailable

zroubalik · 2020-09-18T13:05:25Z

@VojtechVitek makes sense. Is it something you are willing to contribute?

VojtechVitek · 2020-09-18T13:22:36Z

Is there any process for getting a proposal officially accepted?

I'm thinking a better name for the field would be defaultReplicaCount, which would equal to minReplicaCount value by default unless explicitly specified.

Is it something you are willing to contribute?

Sorry, I don't think I have any free dev cycles in the next month or so. Perhaps later in the year.

zroubalik · 2020-09-18T14:48:42Z

Is there any process for getting a proposal officially accepted?

There isn't any official process. The best is to specify requirements in the issue and you can talk about it at the community call

lambohamp · 2020-11-19T08:45:57Z

Hi @zroubalik @tomkerkhove. Is there any progress on this? Thanks.

tomkerkhove · 2020-12-30T07:45:25Z

Not that I'm aware of. Are you interested in contributing this @lambohamp ?

mishamo · 2021-06-07T09:30:56Z

I had a go at this here mishamo@8aca785

It works fine when there are no active triggers. I don't want to PR this just yet, as there are a couple of issues that I noticed and would like to discuss them a little here (let me know if it's better to do so elsewhere).

Prometheus unavailable, but then comes back -> This caused metrics of 0 to come through to the HPA and immediately scaled the deployment to minReplicas. We can alleviate this by setting a delay on the Prometheus readinessProbe, allowing it to scale and catch up with metrics before being added to the service.
What should happen if some triggers are active but others error? Consider the case of 2 separate Prometheus triggers, one instance is unavailable and the other is happily serving metrics. I believe the current behaviour would ignore the unavailable prometheus and only use metrics from the live one via the active HPA. I believe in my fork, the same thing would happen as isActive would be set to true and the case wouldn't match. If we removed the !isActive check from the new case then the deployment would be scaled to defaultReplicas, which would then compete with the existing, live HPA and a scaling loop would occur between the two (is that assumption correct?). I think, ideally, what should happen is a max(replicasFromHPA, defaultReplicas), but as the scaling logic here is split between the executor and the HPAs, I'm not sure how that could be approached. Any thoughts?

zroubalik · 2021-06-08T09:54:03Z

I had a go at this here mishamo@8aca785

It works fine when there are no active triggers. I don't want to PR this just yet, as there are a couple of issues that I noticed and would like to discuss them a little here (let me know if it's better to do so elsewhere).
* Prometheus unavailable, but then comes back -> This caused metrics of 0 to come through to the HPA and immediately scaled the deployment to `minReplicas`. We can alleviate this by setting a delay on the Prometheus readinessProbe, allowing it to scale and catch up with metrics before being added to the service.

So if I understand it correctly, once Prometheus comes back, the first getMetrics call returns 0 and not the correct metrics?

* What should happen if some triggers are active but others error? Consider the case of 2 separate Prometheus triggers, one instance is unavailable and the other is happily serving metrics. I believe the current behaviour would ignore the unavailable prometheus and only use metrics from the live one via the active HPA. I believe in my fork, the same thing would happen as `isActive` would be set to true and the case wouldn't match. If we removed the `!isActive` check from the new case then the deployment would be scaled to `defaultReplicas`, which would then compete with the existing, live HPA and a scaling loop would occur between the two (is that assumption correct?). I think, ideally, what should happen is a `max(replicasFromHPA, defaultReplicas)`, but as the scaling logic here is split between the executor and the HPAs, I'm not sure how that could be approached. Any thoughts?

Generally if there are multiple triggers in SO, all metrics are fed into HPA and HPA then does the decision on the scaling (ie target replica count). The current HPA implementation chooses the greater value for the target replica count. So if one of the scalers is not available, it should do follow the same pattern, so KEDA should report for the unavailable trigger some fallback number.

mishamo · 2021-06-08T11:05:52Z

So if I understand it correctly, once Prometheus comes back, the first getMetrics call returns 0 and not the correct metrics?

Yes, but I see that as a prometheus issue I guess. As the metrics we have defined are a rate(...[2m]) it doesn't really make sense to look at the metrics until we have at least 2 minutes of data. Delaying the readiness probe by that duration basically resolves this issue. I only mention it as it may come up again.

Generally if there are multiple triggers in SO, all metrics are fed into HPA and HPA then does the decision on the scaling (ie target replica count). The current HPA implementation chooses the greater value for the target replica count. So if one of the scalers is not available, it should do follow the same pattern, so KEDA should report for the unavailable trigger some fallback number.

🤔 I'm either not following the suggestion or think it won't work very elegantly. The defaultReplicas approach works because the operator performs the scaling to that number of replicas. The adapter only feeds back a metrics value to the HPA, it has no control over the number of replicas (HPAs responsibility). We could add a field to the trigger (every type of trigger?), something like defaultMetricsValue, which would be the value provided to the HPA if metrics could not be retrieved, but that feels quite hacky to me (faking metrics when they're not available). This could also cause an infinite scaling loop if that value is greater than the threshold. I would imagine users would only want to configure a defaultReplicas in one place and that would be the value used as a base when a source of metrics is not available.

EDIT: Just had a thought; what if we added an additional metric to the HPA, which worked a little like the cron scaler using desiredReplicas = defaultReplicas when a metrics source was unavailable? Do you think that kind of approach would work? ----- On second thoughts, I don't think that's possible without adding some state to the provider.

stale · 2021-10-14T06:47:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

bschaeffer · 2021-10-20T20:53:04Z

For my 2 cents, I don't see how having defaultReplicas is any different than functionality that just maintains current value.

Just seems like newValue || currentReplicas is just as easy as newValue || defaultReplicas.

zroubalik · 2021-10-21T08:39:12Z

@bschaeffer you can use #1872 to mitigate this problem.

NaveLevi · 2022-12-14T08:30:33Z

This was already implemented in 2.8.1.
You can now use the ignoreNullValues flag which will cause the scaler to freeze in case Prometheus is down

* Add document for aws endpoint setting Signed-off-by: Phan Duc <[email protected]> * Fix docs style Signed-off-by: Phan Duc <[email protected]> Signed-off-by: Phan Duc <[email protected]>

bryanhorstmann added feature-request All issues for new features that have not been committed to needs-discussion labels Aug 3, 2020

zroubalik mentioned this issue Aug 3, 2020

Behaviour on non active scaling triggers #886

Closed

zroubalik mentioned this issue Jun 3, 2021

Keda scales to 0 if metric API is not available #1850

Closed

mishamo mentioned this issue Jun 9, 2021

Fallback wrapper scaler #1872

Closed

stale bot added the stale All issues that are marked as stale due to inactivity label Oct 14, 2021

tomkerkhove added feature All issues for new features that have been committed to and removed needs-discussion feature-request All issues for new features that have not been committed to labels Oct 14, 2021

stale bot removed stale All issues that are marked as stale due to inactivity labels Oct 14, 2021

tomkerkhove added this to Roadmap - KEDA Core Feb 10, 2022

tomkerkhove moved this to Backlog in Roadmap - KEDA Core Feb 10, 2022

tomkerkhove moved this from To Do to Proposed in Roadmap - KEDA Core Feb 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus Scaler - Maintain last known state if prometheus is unavailable #965

Prometheus Scaler - Maintain last known state if prometheus is unavailable #965

bryanhorstmann commented Aug 3, 2020

zroubalik commented Aug 3, 2020

tomkerkhove commented Aug 3, 2020

bryanhorstmann commented Aug 4, 2020

tomkerkhove commented Aug 4, 2020

bryanhorstmann commented Aug 4, 2020

tomkerkhove commented Aug 7, 2020

bryanhorstmann commented Aug 7, 2020

zroubalik commented Aug 7, 2020

bryanhorstmann commented Aug 7, 2020

zroubalik commented Aug 7, 2020 •

edited

Loading

bryanhorstmann commented Aug 13, 2020

VojtechVitek commented Sep 18, 2020

zroubalik commented Sep 18, 2020

VojtechVitek commented Sep 18, 2020

zroubalik commented Sep 18, 2020

lambohamp commented Nov 19, 2020 •

edited

Loading

tomkerkhove commented Dec 30, 2020

mishamo commented Jun 7, 2021

zroubalik commented Jun 8, 2021

mishamo commented Jun 8, 2021 •

edited

Loading

stale bot commented Oct 14, 2021

bschaeffer commented Oct 20, 2021

zroubalik commented Oct 21, 2021

NaveLevi commented Dec 14, 2022

Prometheus Scaler - Maintain last known state if prometheus is unavailable #965

Prometheus Scaler - Maintain last known state if prometheus is unavailable #965

Comments

bryanhorstmann commented Aug 3, 2020

Use-Case

zroubalik commented Aug 3, 2020

tomkerkhove commented Aug 3, 2020

bryanhorstmann commented Aug 4, 2020

tomkerkhove commented Aug 4, 2020

bryanhorstmann commented Aug 4, 2020

tomkerkhove commented Aug 7, 2020

bryanhorstmann commented Aug 7, 2020

zroubalik commented Aug 7, 2020

bryanhorstmann commented Aug 7, 2020

zroubalik commented Aug 7, 2020 • edited Loading

bryanhorstmann commented Aug 13, 2020

VojtechVitek commented Sep 18, 2020

zroubalik commented Sep 18, 2020

VojtechVitek commented Sep 18, 2020

zroubalik commented Sep 18, 2020

lambohamp commented Nov 19, 2020 • edited Loading

tomkerkhove commented Dec 30, 2020

mishamo commented Jun 7, 2021

zroubalik commented Jun 8, 2021

mishamo commented Jun 8, 2021 • edited Loading

stale bot commented Oct 14, 2021

bschaeffer commented Oct 20, 2021

zroubalik commented Oct 21, 2021

NaveLevi commented Dec 14, 2022

zroubalik commented Aug 7, 2020 •

edited

Loading

lambohamp commented Nov 19, 2020 •

edited

Loading

mishamo commented Jun 8, 2021 •

edited

Loading