Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide proper certificates for kube-scheduler and kube-controller-manager #2244

Open
FrediWeber opened this issue Aug 4, 2020 · 29 comments
Labels
area/security kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Milestone

Comments

@FrediWeber
Copy link

FEATURE REQUEST

Versions

kubeadm version (use kubeadm version): 1.18.6

Environment:

  • Kubernetes version (use kubectl version): 1.18.6
  • Cloud provider or hardware configuration: Bare-Metal
  • OS (e.g. from /etc/os-release): Debian 10
  • Kernel (e.g. uname -a): 4.19.0-9
  • Others:

What happened?

Kubeadm disables the "insecure" ports of kube-scheduler and kube-controller-manager by setting the --port=0 flag. Therefore metrics have to be scaped over TLS. This is fine but Kubeadm doesn't seem to manage the certificates of kube-scheduler and kube-controller manager. These components - if no certificate is provided - will create a self signed certificate to serve requests. One could just disable certificate verification but that would somehow defer the use of TLS.

What you expected to happen?

Kubeadm should create and manage certificates for the "secure" port of kube-scheduler and kube-controller-manager. These certificates should be signed by the CA, that is created by Kubeadm.

How to reproduce it (as minimally and precisely as possible)?

  1. Create a cluster with Kubeadm
  2. Access the "secure" port (10257 or 10259)
@neolit123
Copy link
Member

@FrediWeber thank you for logging the ticket.
you have a valid observation that we do not sign the serving certificate and key for the components in question.

we had a long discussion with a user on why we are not singing these for kubeadm and you can read more about this here:
kubernetes/kubernetes#80063

IIUC, one undesired side effect, is that if we start doing that our HTTPS probes will fail, as Pod Probe API does not support signed certificates for HTTPS (only self-signed):
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#probe-v1-core
kubernetes/kubernetes#18226 (comment)

we could workaround that using a "command" probe that is cert/key aware, but this is difficult as the component images are "distroless" (no shell, no tools). so maybe one day we can support that if core k8s supports it properly.

@neolit123 neolit123 added area/security priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. kind/feature Categorizes issue or PR as related to a new feature. labels Aug 4, 2020
@FrediWeber
Copy link
Author

@neolit123 Thank you very much for your fast response and the clarifications.

If i understand it correctly, the issue kubernetes/kubernetes#80063 is more about mapping the hole PKI dir into the container, external PKIs and shorter renew intervals.

I read a little bit about health checks with HTTPS in kubernetes/kubernetes#18226 and https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/.
If I understand everything correctly, the Kubernetes docs state, that the certificate is not checked at all for the health check so it shouldn't matter if the certificate is self-signed or if it's properly signed with the already present CA.

If scheme field is set to HTTPS, the kubelet sends an HTTPS request skipping the certificate verification.

I don't see any security implications in just mapping the signed certificates and corresponding keys for kube-scheduler and kube-controller-manager.

The only negative downside would be, that the certificate rotation would have to be managed. On the other hand this is already the case for other certificates AFAIK.

@neolit123
Copy link
Member

If I understand everything correctly, the Kubernetes docs state, that the certificate is not checked at all for the health check so it shouldn't matter if the certificate is self-signed or if it's properly signed with the already present CA.

i'm not so sure about this and i haven't tried it. my understanding is that if the server no longer has self-signed certificates this means that it would reject any client connections on HTTPS that do not pass authentication.
e.g. curl -k... would no longer work?

If i understand it correctly, the issue kubernetes/kubernetes#80063 is more about mapping the hole PKI dir into the container, external PKIs and shorter renew intervals.

that is true. however, the discussion there was also about the fact that today users can customize their kube-scheduler and KCM deployments via kubeadm to enable the usage of the custom signed serving certificates for these components if they want their metrics and health checks to be accessible over "true" TLS (i.e. pass the flags and mount the certificates using extraArgs, extraVolumes under ClusterConfiguration).

with the requirement of kubeadm managing these extra certificates for renewal, i'm leaning towards -1 initially, but i would like to get feedback from others too.

cc @randomvariable @fabriziopandini @detiber

@neolit123 neolit123 added the kind/design Categorizes issue or PR as related to design. label Aug 4, 2020
@FrediWeber
Copy link
Author

i'm not so sure about this and i haven't tried it. my understanding is that if the server no longer has self-signed certificates this means that it would reject any client connections on HTTPS that do not pass authentication.
e.g. curl -k... would no longer work?

So do you mean the kube-controller-manager and kube-scheduler would no longer accept the health check requests because they do not pass a client certificate or any other authentication?
Or do you mean the "client side" of the Kubernetes health check would not connect because the certificate is not self signed? I'm not sure about the first case but if the docs are correct, the second case should not happen.

Please also keep in mind, that the current certificate is also not really self-signed. Kube-controller-manager and kube-scheduler seem to create an internal, temporary CA on startup and sign the certificate with their own CA.
image

You are absolutely right about the existing possibility to mount certificates and set the options with the extraArgs.

The thing that has changed is, that Kubeadm by default deactivates the insecure port with --port=0. I'm aware that this is deprecated in the upstream components (kube-scheduler and kube-controller-manager) anyway but I think that Kubeadm should configure these components properly especially when Kubeadm already "manages" a CA from which these certificates could relative easily be signed.

Another approach would be to let these two components handle their front-facing certificates on their own like kubelet does.

@detiber
Copy link
Member

detiber commented Aug 4, 2020

Long term, I would love to see the ability to leverage the certificates API to do automated request and renewal of serving certificates for kube-scheduler and kube-controller-manager similar to the work that is being done to enable this support for the kubelet (https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md)

@neolit123
Copy link
Member

neolit123 commented Aug 4, 2020

So do you mean the kube-controller-manager and kube-scheduler would no longer accept the health check requests because they do not pass a client certificate or any other authentication?

that was my understanding. then again, we do serve the kube-apiserver on HTTPS and it's probe does not have/pass certficates, so perhaps it would just work.

I'm aware that this is deprecated in the upstream components (kube-scheduler and kube-controller-manager) anyway but I think that Kubeadm should configure these components properly especially when Kubeadm already "manages" a CA from which these certificates could relative easily be signed.

but again the problem is that this is yet another set of certficates that kubeadm has to manage during renewal, and must consider during our "copy certficates" functionality for HA support. it is not a strict requirement and kubeadm already supports it for users that want to do that using extraArgs. we have a similar case for the kubelet serving certificate which is "self-signed".

i'd say, at minimum it would be worthy of a enhancement proposal (KEP):
https://github.com/kubernetes/enhancements/tree/master/keps

@neolit123
Copy link
Member

neolit123 commented Aug 4, 2020

Long term, I would love to see the ability to leverage the certificates API to do automated request and renewal of serving certificates for kube-scheduler and kube-controller-manager similar to the work that is being done to enable this support for the kubelet (https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md)

i tried to follow the latest iterations of the CSR API closely, but i have not seen discussions around CSRs for the serving certificates of these components via the KCM CA. my guess would be that there might be some sort of a blocker for doing that, given a lot of planning went in the v1 of the API.

@FrediWeber
Copy link
Author

i'd say, at minimum it would be worthy of a enhancement proposal (KEP):
https://github.com/kubernetes/enhancements/tree/master/keps

Would it be okay for you if I'd start the process?

@neolit123
Copy link
Member

neolit123 commented Aug 5, 2020

for a feature that is already possible via the kubeadm config/API, the benefits need to justify the maintenance complexity.
after all, kubeadm's goal is to create a "minimal viable cluster" by default.

to me it always seems better to first collect some support (+1s) on the idea before proceeding with the KEP...
the KEP process can be quite involved and my estimate is that at this stage the KEP will not pass. so, it is probably better to flesh out the idea more in a discussion here.

@FrediWeber
Copy link
Author

What if the kube-scheduler and kube-controller-manager would manage their front facing server certificate with the certificates.k8s.io API? There would need to be a new controller to automatically sign the CSRs.
As a fallback these components could still use their self created CA.

  1. Check if the flags for certificates are set - if yes use them and start the component
  2. If no front facing certificate flags are set, try to generate the CSR and let it sign by the corresponding controller
  3. If step 2 fails or is disabled, proceed in the same way as today (generate own CA etc.)

Kubeadm wouldn't have to do anything and there would still be the possibility to provide own certificates if needed.

@neolit123
Copy link
Member

this could work, but i guess we will have to own the source code and container image for this new controller.

BTW, does the kube-scheduler even support /metrics?

for 1.18.0 it just reports:

no kind is registered for the type v1.Status in scheme

KCM on the other hand reports what i've expected to see:

  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/metrics\"",
  "reason": "Forbidden",
  "details": {

@FrediWeber
Copy link
Author

I just double checked it.
The problem seems to be, that the scheduler does not provide a clean error message if not properly authenticated.
If you authenticate against the /metrics endpoint with a token of an authorized service account, the metrics are provided.

@neolit123
Copy link
Member

The problem seems to be, that the scheduler does not provide a clean error message if not properly authenticated.

would you care logging a ticket for that in kubernetes/kubernetes and tag with /sig scheduling instrumentation?
i could not find an existing one by searching in the repository.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2020
@neolit123
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2020
@neolit123 neolit123 modified the milestones: v1.20, v1.21 Dec 2, 2020
@ksa-real
Copy link

ksa-real commented Jan 19, 2021

I was trying to set up metrics scraping with Prometheus and am pretty confused about the current state of things. Is there a recommended way to monitor kube-scheduler (S) and kube-controller-manager (CM) metrics?

prometheus-operator/kube-prometheus#718

Prometheus is running on the node, different from a master node, and is expected to scrape S/CM metrics from multiple master nodes. Two issues:

  • Both S and CM bind to 127.0.0.1 by default which makes it impossible to access metrics by Prometheus. Binding to 0.0.0.0 (current recommended workaround) is too wide-scope as it may bind to external IP addresses and expose metrics to the internet. etcd and apiserver's approach with --advertise-client-url with node IP and probably 127.0.0.1 seem like a better idea.
  • Moving to HTTPS it seems reasonable to have TLS certificates on this endpoint for a client (Prometheus) to trust it. As a workaround it is possible to set insecureSkipVerify: true in Prometheus scrape config, but obviously, it is better not cut corners. X509v3 Subject Alternative Name should be similar to etcd one: DNS:localhost, DNS:node-1, IP Address:<node IP, e.g. 10.1.1.1>, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1

Authentication already happens via service bearer token.

I can create another issue, but first, prefer to get some feedback as the current issue is related.

@neolit123
Copy link
Member

neolit123 commented Jan 19, 2021

I was trying to set up metrics scraping with Prometheus and am pretty confused about the current state of things. Is there a recommended way to monitor kube-scheduler (S) and kube-controller-manager (CM) metrics?

you can sign certficates for a user that is authorized to access /metrics endpoints.
e.g.

rules:
- nonResourceURLs: ["/metrics"]
  verbs: ["get", "post"]

https://kubernetes.io/docs/reference/access-authn-authz/rbac/

creating certificates:
https://kubernetes.io/docs/concepts/cluster-administration/certificates/

you can then feed such to a TLS client that tries to access the endpoint.
can be verified locally with curl too.

alternatively for the legacy behavior of insecure metrics you can grant the user system:anonymous /metrics access.
not really recommended.

EDIT:

Authentication already happens via service bearer token.

sorry missed that part. in that case there is likely lack of authz.

@ksa-real
Copy link

ksa-real commented Jan 20, 2021

I think you didn't understand me. To access the metrics endpoint 3 things must happen:

  1. Port must be accessible.
  2. Assuming this is https, the client must trust the server (or opt to not care).
  3. The server must trust the client (or opt to not care).

You are talking about (3), but this is the only part that works. Parts (1) and (2) are broken.

  1. Prometheus is executed on some node (probably non-master), pod IP is e.g. 10.2.1.4. Even if it is master, it must scrape other master nodes as well, and it discovers S and CM via corresponding K8s services, so it gets node IP addresses (e.g. 10.1.1.1, 10.1.1.2 ...) and not 127.0.0.1. With this, 10.1.1.1:10257 and 10.1.1.1:10259 are inaccessible from anywhere including 10.2.1.4. As I said, one workaround is to add the following to kubeadm config:
controllerManager:
  extraArgs:
    bind-address: 0.0.0.0
scheduler:
  extraArgs:
    bind-address: 0.0.0.0

And then propagate to configs via

kubeadm init phase control-plane scheduler --config kubeadm.yml
kubeadm init phase control-plane control-manager --config kubeadm.yml
# then restart kubelet

That works not great in my case. I have an interface facing internet (e.g. 80.1.1.1, 80.1.1.2 ...) on the nodes. Binding to 0.0.0.0 also binds to 80.0.0.x, and metrics become available over internet. I can stop using kubeadm and manually fix the manifests to bind to 10.1.1.x (different value on each node), but at this most likely is going to break components talking to S/CM because AFAIU it is not possible to bind to both 127.0.0.1 and 10.1.1.x, and because of (2).

  1. For a client to trust a server, the server must supply a server TLS certificate signed by CA trusted by the client with no discrepancies. Not sure if Prometheus checks the CA, but it certainly checks IP Address entries in X509v3 Subject Alternative Name certificate part. With --bind-address=127.0.0.1 Prometheus gives dial tcp 10.1.1.1:10257: connect: connection refused. With --bind-address=0.0.0.0 Prometheus reports x509: certificate is valid for 127.0.0.1, not 10.1.1.1.

  2. This part works. When I add insecureSkipVerify: true to Prometheus scrape configs, it successfully scrapes the metrics. The scrape config:

- job_name: monitoring/main-kube-prometheus-stack-kube-scheduler/0
  scheme: https
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: false
... 

The service account token already has the get metrics permission in its role:

rules:
- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
  verbs: ["get"]

@ksa-real
Copy link

My current workaround is to apply the firewall rules on every node BEFORE applying the above steps. Replace the 10.1.1.1 with the node IP:

cat <<EOF >/etc/local.d/k8s-firewall.start
iptables -A INPUT -p tcp -d 127.0.0.1,10.1.1.1 -m multiport --dports 10257,10259 -j ACCEPT
iptables -A INPUT -p tcp -m multiport --dports 10257,10259 -j DROP
EOF
chmod +x /etc/local.d/k8s-firewall.start
/etc/local.d/k8s-firewall.start

@ksa-real
Copy link

The issue is NOT specific to Prometheus. This is specific to kubeadm. The way kubeadm sets up scheduler and controller-manager unless the metrics collector is deployed on every master node, any pull-based central metrics scraper

  • cannot read metrics from these components
  • cannot trust the certificates, provided by these components

What is the point in providing metrics endpoint if it cannot be accessed? I guess, kubeadm doesn't do it on purpose, right?

@neolit123
Copy link
Member

I guess, kubeadm doesn't do it on purpose, right?

yes, because it feels like an extension and not something that all users would need.

the discussion above had the following:

... today users can customize their kube-scheduler and KCM deployments via kubeadm to enable the usage of the custom signed serving certificates for these components if they want their metrics and health checks to be accessible over "true" TLS (i.e. pass the flags and mount the certificates using extraArgs, extraVolumes under ClusterConfiguration).

#2244 (comment)

so technically you should be able to pass extra flags to the components and set them up.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 9, 2021
@fabriziopandini
Copy link
Member

/remove-lifecycle stale.

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 9, 2021
@fabriziopandini
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 10, 2021
@neolit123 neolit123 modified the milestones: v1.22, v1.23 Jul 5, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2021
@neolit123 neolit123 removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 21, 2021
@neolit123 neolit123 modified the milestones: v1.23, v1.24 Nov 23, 2021
@neolit123 neolit123 modified the milestones: v1.24, Next Jan 11, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 11, 2022
@neolit123 neolit123 added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 11, 2022
@centromere
Copy link

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

9 participants