Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring of kube-scheduler and kube-controller-manager #3759

Open
pschichtel opened this issue Nov 24, 2023 · 8 comments
Open

Monitoring of kube-scheduler and kube-controller-manager #3759

pschichtel opened this issue Nov 24, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@pschichtel
Copy link
Contributor

Is your feature request related to a problem? Please describe.

kube-prometheus-stack is a pretty popular monitoring setup and it comes with a nice set of default alert rules covering a good portion of things to monitor around kubernetes' state. Two if these rules alert when either kube-scheduler or kube-controller-manager are not up.

The chart assumes by default, that kube-scheduler and kube-controller-manager run as pods in the cluster and can be discovered using services created by the chart. That is not the case with k0s (and other distros), where the processes run directly on the nodes. For this issue there are two workarounds:

  1. manually list the node IPs in the chart configuration, which will generate appropriate endpoints for the services
  2. setup a daemonset with an appropriate nodeSelector that starts pause containers in hostNetwork mode with the labels expected by the service

I opted for the second approach because I don't like hardcoding IP addresses unless I absolutely have to.

Prometheus picks up the IPs as targets now, but it can't fetch metrics, because the network ports are only bound to 127.0.0.1.

Now it would be possible to deploy simple proxy containers that make the ports available on 0.0.0.0, but that exposes them to the wider network which is probably not desirable in many cases.

This issue discusses the problem (and a certificate problem): prometheus-operator/kube-prometheus#718

Describe the solution you would like

I want a way for prometheus running inside the cluster to access kube-scheduler and kube-controller-manager each running outside the cluster. This should be done in a way that doesn't expose the ports to the outside network, at least not without explicitly opting in.

Describe alternatives you've considered

No response

Additional context

No response

@pschichtel pschichtel added the enhancement New feature or request label Nov 24, 2023
@pschichtel
Copy link
Contributor Author

One possible workaround I came up with:

  1. Deploy a DaemonSet to all controller nodes that bridges the ports 127.0.0.1:10257 and 127.0.0.1:10259 from the hostNetwork into a unix domain socket each
  2. Deploy an additional DaemonSet for each domain socket that gets the socket mounted and bridges it to the matching port number on the pod IP.

I wonder if this is something that could be done with the nllb infrastructure of k0s.

@jnummelin
Copy link
Member

I believe this use case is covered by this: https://docs.k0sproject.io/v1.28.4+k0s.0/system-monitoring/

@pschichtel
Copy link
Contributor Author

Interesting, I'll have a look at that next week or so. Will changes to k0sctl's installFlags be applied after initial installation?

@twz123
Copy link
Member

twz123 commented Nov 24, 2023

@pschichtel
Copy link
Contributor Author

I have this running now. Are there any concerns/risks in enabling it given it's not on by default? And am I correct in assuming there are not already ready-made ServiceMonitor and PrometheusRule resources available for this setup?

@jnummelin
Copy link
Member

Are there any concerns/risks in enabling it given it's not on by default?

IMO no. We didn't want to enable it by default since it kinda implies one needs prometheus which not all k0s cluster will have.

I correct in assuming there are not already ready-made ServiceMonitor and PrometheusRule resources available for this setup?

Not that I know at least.

@pschichtel
Copy link
Contributor Author

I correct in assuming there are not already ready-made ServiceMonitor and PrometheusRule resources available for this setup?

Not that I know at least.

Would there be a place to contribute them somewhere in k0s' context in case kube-prometheus-stack doesn't accept distribution-specific stuff?

@twz123
Copy link
Member

twz123 commented Dec 11, 2023

Would there be a place to contribute them somewhere in k0s' context in case kube-prometheus-stack doesn't accept distribution-specific stuff?

Sounds like you already have a (sort of) working kube-prometheus setup on top of k0s. Would be awesome if you'd be willing to contribute a setup guide for this somewhere in the docs ❤️ ❓ That guide could include the k0s specific custom resources as well, I suppose?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants