-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KSM is unable to authenticate to cluster in 1.3.x+ releases #543
Comments
@ehashman can you post the full logs of the kube-state-metrics pod? I am expecting something like In addition would you mind posting the deployment manifest and the rbac related manifests? |
Logs:
Command from the manifest:
Re: full manifests, let me see if I can just get you the diff. They are not very different than upstream, particularly the RBAC stuff. |
The only deviation for our Kubernetes manifests from upstream are some small changes to the deployment. Namely, we don't use the nanny Pod and set resource requests explicitly, we specify the API server explicitly, and we mount our internal certificates inside the KSM Pod. diff --git a/kubernetes/kube-state-metrics-deployment.yaml b/kubernetes/kube-state-metrics-deployment.yaml
index 28a119f..7a00a66 100644
--- a/kubernetes/kube-state-metrics-deployment.yaml
+++ b/kubernetes/kube-state-metrics-deployment.yaml
@@ -1,59 +1,61 @@
apiVersion: apps/v1beta2
-# Kubernetes versions after 1.9.0 should use apps/v1
-# Kubernetes versions before 1.8.0 should use apps/v1beta1 or extensions/v1beta1
kind: Deployment
metadata:
- name: kube-state-metrics
+ name: ksm-1.4.0
namespace: kube-system
spec:
selector:
matchLabels:
- k8s-app: kube-state-metrics
+ k8s-app: ksm-1.4.0
replicas: 1
template:
metadata:
labels:
- k8s-app: kube-state-metrics
+ k8s-app: ksm-1.4.0
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
- image: quay.io/coreos/kube-state-metrics:v1.4.0
+ image: <internal image server>/kube-state-metrics:v1.4.0
ports:
- name: http-metrics
containerPort: 8080
- name: telemetry
containerPort: 8081
- readinessProbe:
+ livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
- - name: addon-resizer
- image: k8s.gcr.io/addon-resizer:1.7
+ readinessProbe:
+ httpGet:
+ path: /metrics
+ port: 8080
+ initialDelaySeconds: 15
+ timeoutSeconds: 15
+ command:
+ - /kube-state-metrics
+ - --apiserver=https://<master FQDN>
+ - --port=8080
+ - --telemetry-port=8081
resources:
- limits:
- cpu: 100m
- memory: 30Mi
requests:
- cpu: 100m
- memory: 30Mi
- env:
- - name: MY_POD_NAME
- valueFrom:
- fieldRef:
- fieldPath: metadata.name
- - name: MY_POD_NAMESPACE
- valueFrom:
- fieldRef:
- fieldPath: metadata.namespace
- command:
- - /pod_nanny
- - --container=kube-state-metrics
- - --cpu=100m
- - --extra-cpu=1m
- - --memory=100Mi
- - --extra-memory=2Mi
- - --threshold=5
- - --deployment=kube-state-metrics
+ cpu: 3
+ memory: 8Gi
+ limits:
+ cpu: 3
+ memory: 8Gi
+ volumeMounts:
+ - mountPath: /etc/ssl/certs
+ name: ca-certificates
+ readOnly: true
+ - mountPath: /etc/ssl/internal/certs/ca-certificates.crt
+ name: cert
+ volumes:
+ - name: ca-certificates
+ hostPath:
+ path: /etc/ssl/certs
+ - name: cert
+ hostPath:
+ path: /etc/ssl/certs/IT_Security.pem |
@ehashman backwards compatibility was sadly broken in #371 You can no longer use in-cluster config AND specify the API server URL. We are also affected by this as we use the same configuration (in-cluster + API server URL) and are unable to upgrade past 1.2 at the moment. @andyxning given your comment here #371 (comment), is it possible we could re-introduce this as a valid configuration? I would imagine that we are not the only two users affected by this. |
Ahhh, very interesting. In that case, I wonder if I can hack around this by overriding the master discovery environment variables (we need to specify API server by FQDN because our certs don't contain IP SANs and fail to verify). Will let you know how that goes. |
@ehashman great suggestion, overriding the |
Confirmed that this worked for me as well and I've successfully upgraded our KSM to 1.4.0 in production. Question: I checked the release notes when trying to figure this out and didn't see any mention of the breaking changes in #371 that caused this issue. Could we retroactively add a note of this with a link to the workaround? |
@ehashman right, this should have been a |
Happy to tackle this, PR incoming. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
When I run kube-state-metrics on any release 1.3.0+, it launches, but when it attempts to scrape any resources, I get the error:
However, all the various clusterbindingroles are set up correctly for the cluster. I am not sure why KSM is unable to authenticate.
The correct token is definitely loaded inside the KSM Pod and I've successfully used it to authenticate as the ServiceAccount against the API server with a curl command (it matches "system:serviceaccount:kube-system:kube-state-metrics" in this case). For some reason, it seems that KSM is ignoring the token?
What you expected to happen:
KSM successfully authenticates with the API server and scrapes the cluster. This does work correctly in the 1.2.0 release (but I believe that certificate verification might not be working properly in that release, as I didn't have to load in our custom CA certs to get KSM 1.2.0 to work correctly).
How to reproduce it (as minimally and precisely as possible):
I'm not really sure how to minimally reproduce this; it affects all our clusters (prod + QA) but there's nothing particularly special about them. We have RBAC enabled.
Environment:
kubectl version
): 1.8.7The text was updated successfully, but these errors were encountered: