Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEDA documentation around number of replicas supported for metrics server is contradictory #1450

Closed
lantingchiang opened this issue Jul 11, 2024 · 10 comments

Comments

@lantingchiang
Copy link
Contributor

Under https://keda.sh/docs/2.14/operate/cluster/#high-availability, there is a table like so:
Screenshot 2024-07-11 at 4 35 01 PM

For the "Metrics Server" deployment, the "Support Replicas" column has the value "1", which reads as only one replica is supported. However, the "Note" column mentions that "You can run multiple replicas of our metrics sever", which indicates that multiple replicas are supported.

It should be clarified exactly how many replicas are supported and perhaps the language for the "Support Replicas" column should be updated.

@tomkerkhove
Copy link
Member

"You can run multiple replicas of our metrics sever", which indicates that multiple replicas are supported.

It is not because you can, that it is supported :)

Metric server only has 1 replica that will really do anything, if you add more they are just in standby mode until the other one goes down. But @zroubalik / @JorTurFer can verify my thoughts

@zroubalik
Copy link
Member

"You can run multiple replicas of our metrics sever", which indicates that multiple replicas are supported.

It is not because you can, that it is supported :)

Metric server only has 1 replica that will really do anything, if you add more they are just in standby mode until the other one goes down. But @zroubalik / @JorTurFer can verify my thoughts

Correct, unless you add the cli flag mentioned in the screenshot.

@lantingchiang
Copy link
Contributor Author

I see, so it's essentially the same situation as the operator, where only one replica does work and the other is a standby? In that case, why would the "Support Replicas" value for the operator be 2?

@JorTurFer
Copy link
Member

As for the operator, having more replicas improves the HA but they won't balance the load. I mean, if you have 2 replicas for each component, only one of each will work (operator runs with leader election and metrics server is called reusing the socket by the control plane) but in case of disruption where you lose a node and the new pods can't be scheduled, the second replica (already scheduled, up and running) can take the control and keep the system running

@lantingchiang
Copy link
Contributor Author

Thanks all! I actually tried deploying two metrics-server pods without setting enable-aggregator-routing=true. I observed that both metrics-server pods have a log line like the following:

I0712 19:42:37.026404       1 trace.go:205] Trace[364897128]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/devx-keda-poc/s0-prometheus-devx-test,user-agent:kube-controller-manager/v1.22.17 (linux/amd64) kubernetes/a7736ea/system:serviceaccount:kube-system:horizontal-pod-autoscaler,audit-id:7fd1c8f2-46ca-4ef1-b67d-d598a47b00e6,client:10.251.168.205,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (12-Jul-2024 19:42:36.173) (total time: 852ms):
Trace[364897128]: ---"Listing from storage done" 852ms (19:42:37.026)
Trace[364897128]: [852.824475ms] [852.824475ms] END

This makes me think that both metrics-server pods are doing useful work even without enable-aggregator-routing=true. Additionally, I did a kubectl get leases in the keda namespace, and there's only a lease for the keda-operator, not the keda-operator-metrics-server. Does the metrics-server actually do leader election as well?

% kg leases -n keda
NAME               HOLDER                                                                AGE
operator.keda.sh   keda-operator-68c4f4fb76-c2nft_70ad8b8f-c463-4371-9190-08aebf193774   22d

We're running keda 2.8 on K8s 1.22.

@JorTurFer
Copy link
Member

Metrics server doesn't have leader election, all the instances are functional. It's the control plane who calls to a single instance based on the cli flag. I don't have information about how different providers configure this, so maybe it's enabled in your clusters

@lantingchiang
Copy link
Contributor Author

Thanks @JorTurFer, that makes a lot of sense. I believe https://keda.sh/docs/2.14/operate/cluster/#configure-leader-election needs to be updated then, since it suggests configuring leader election parameters for the metrics-server, which doesn't use leader election at all.

@JorTurFer
Copy link
Member

Thanks @JorTurFer, that makes a lot of sense. I believe keda.sh/docs/2.14/operate/cluster#configure-leader-election needs to be updated then, since it suggests configuring leader election parameters for the metrics-server, which doesn't use leader election at all.

you're totally right, those leader election envs don't apply to the metrics server. Are you willing to open a PR with the change?

@lantingchiang
Copy link
Contributor Author

Yup!

@tomkerkhove
Copy link
Member

I think we are good to close, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants