KEDA documentation around number of replicas supported for metrics server is contradictory #1450

lantingchiang · 2024-07-11T20:36:54Z

Under https://keda.sh/docs/2.14/operate/cluster/#high-availability, there is a table like so:

For the "Metrics Server" deployment, the "Support Replicas" column has the value "1", which reads as only one replica is supported. However, the "Note" column mentions that "You can run multiple replicas of our metrics sever", which indicates that multiple replicas are supported.

It should be clarified exactly how many replicas are supported and perhaps the language for the "Support Replicas" column should be updated.

tomkerkhove · 2024-07-12T07:43:58Z

"You can run multiple replicas of our metrics sever", which indicates that multiple replicas are supported.

It is not because you can, that it is supported :)

Metric server only has 1 replica that will really do anything, if you add more they are just in standby mode until the other one goes down. But @zroubalik / @JorTurFer can verify my thoughts

zroubalik · 2024-07-12T08:29:01Z

"You can run multiple replicas of our metrics sever", which indicates that multiple replicas are supported.

It is not because you can, that it is supported :)

Metric server only has 1 replica that will really do anything, if you add more they are just in standby mode until the other one goes down. But @zroubalik / @JorTurFer can verify my thoughts

Correct, unless you add the cli flag mentioned in the screenshot.

lantingchiang · 2024-07-12T13:47:09Z

I see, so it's essentially the same situation as the operator, where only one replica does work and the other is a standby? In that case, why would the "Support Replicas" value for the operator be 2?

JorTurFer · 2024-07-22T13:26:00Z

As for the operator, having more replicas improves the HA but they won't balance the load. I mean, if you have 2 replicas for each component, only one of each will work (operator runs with leader election and metrics server is called reusing the socket by the control plane) but in case of disruption where you lose a node and the new pods can't be scheduled, the second replica (already scheduled, up and running) can take the control and keep the system running

lantingchiang · 2024-07-22T14:15:26Z

Thanks all! I actually tried deploying two metrics-server pods without setting enable-aggregator-routing=true. I observed that both metrics-server pods have a log line like the following:

I0712 19:42:37.026404       1 trace.go:205] Trace[364897128]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/devx-keda-poc/s0-prometheus-devx-test,user-agent:kube-controller-manager/v1.22.17 (linux/amd64) kubernetes/a7736ea/system:serviceaccount:kube-system:horizontal-pod-autoscaler,audit-id:7fd1c8f2-46ca-4ef1-b67d-d598a47b00e6,client:10.251.168.205,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (12-Jul-2024 19:42:36.173) (total time: 852ms):
Trace[364897128]: ---"Listing from storage done" 852ms (19:42:37.026)
Trace[364897128]: [852.824475ms] [852.824475ms] END

This makes me think that both metrics-server pods are doing useful work even without enable-aggregator-routing=true. Additionally, I did a kubectl get leases in the keda namespace, and there's only a lease for the keda-operator, not the keda-operator-metrics-server. Does the metrics-server actually do leader election as well?

% kg leases -n keda
NAME               HOLDER                                                                AGE
operator.keda.sh   keda-operator-68c4f4fb76-c2nft_70ad8b8f-c463-4371-9190-08aebf193774   22d

We're running keda 2.8 on K8s 1.22.

JorTurFer · 2024-07-22T14:56:29Z

Metrics server doesn't have leader election, all the instances are functional. It's the control plane who calls to a single instance based on the cli flag. I don't have information about how different providers configure this, so maybe it's enabled in your clusters

lantingchiang · 2024-07-24T18:15:14Z

Thanks @JorTurFer, that makes a lot of sense. I believe https://keda.sh/docs/2.14/operate/cluster/#configure-leader-election needs to be updated then, since it suggests configuring leader election parameters for the metrics-server, which doesn't use leader election at all.

JorTurFer · 2024-07-24T19:44:52Z

Thanks @JorTurFer, that makes a lot of sense. I believe keda.sh/docs/2.14/operate/cluster#configure-leader-election needs to be updated then, since it suggests configuring leader election parameters for the metrics-server, which doesn't use leader election at all.

you're totally right, those leader election envs don't apply to the metrics server. Are you willing to open a PR with the change?

lantingchiang · 2024-07-24T19:46:46Z

Yup!

tomkerkhove · 2024-08-12T07:17:57Z

I think we are good to close, no?

This was referenced Jul 24, 2024

remove metrics server leader election env vars #1434

Merged

remove unused leader election parameters of metrics adapter kedacore/keda#5986

Merged

tomkerkhove transferred this issue from kedacore/keda Aug 12, 2024

JorTurFer closed this as completed Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEDA documentation around number of replicas supported for metrics server is contradictory #1450

KEDA documentation around number of replicas supported for metrics server is contradictory #1450

lantingchiang commented Jul 11, 2024

tomkerkhove commented Jul 12, 2024

zroubalik commented Jul 12, 2024

lantingchiang commented Jul 12, 2024

JorTurFer commented Jul 22, 2024

lantingchiang commented Jul 22, 2024

JorTurFer commented Jul 22, 2024

lantingchiang commented Jul 24, 2024

JorTurFer commented Jul 24, 2024

lantingchiang commented Jul 24, 2024

tomkerkhove commented Aug 12, 2024

KEDA documentation around number of replicas supported for metrics server is contradictory #1450

KEDA documentation around number of replicas supported for metrics server is contradictory #1450

Comments

lantingchiang commented Jul 11, 2024

tomkerkhove commented Jul 12, 2024

zroubalik commented Jul 12, 2024

lantingchiang commented Jul 12, 2024

JorTurFer commented Jul 22, 2024

lantingchiang commented Jul 22, 2024

JorTurFer commented Jul 22, 2024

lantingchiang commented Jul 24, 2024

JorTurFer commented Jul 24, 2024

lantingchiang commented Jul 24, 2024

tomkerkhove commented Aug 12, 2024