-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Digital Ocean implementation of K8's does not allow metrics-server to function #150
Comments
Additional troubleshooting:
Looks like metrics-server is functioning fine ^ Endpoints & Pod IP's below:
|
Hey @andrewsykim - wondering if you'd be able to let me know if the above is a Digital Ocean bespoke issue, or if I am doing wrong. If it is just me I will go away and figure out why, but I think it might be a Digital Ocean thing. Thanks a lot |
Hey @benjamin-maynard, sorry for the delay! Let me dig further and get back to you :) |
Hey @andrewsykim - Was just wondering if you managed to dig anything up on this? I've performed loads of debugging since and can't see anything wrong with my implementation. |
Looks like the Linking another related question: https://www.digitalocean.com/community/questions/cannot-install-heapster-to-cluster-due-to-kubelets-not-allowing-to-get-metrics-on-port-10255 |
@cbenhagen Looks like there is potentially a couple issues at play! Tried to get an update from Digital Ocean Support too, they've said they've managed to replicate it in some instances, but also have got it to work too. Sadly no ETA 😢. Have asked them how they managed to get it to work as a pretty important component missing. Hopefully @andrewsykim can help too. Don't want to move to GKE! |
Sorry folks, we're working through this, it's been a bit busy with KubeCON coming up. Hoping to have more details soon (by next week maybe?) :) |
@andrewsykim can you reproduce the issue? |
We've also ran into the same problem:
The logs from metrics-server:
This is with the recommendations of some other reports online stating to run with the additional commands
|
The following works for me now: values.yaml:args:
- --logtostderr
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls |
Adding these vars on deployment file of metrics-server works.
|
Hello everyone, I am having the same issue with metric beat that related to ELK stack, `error making http request: Get http://localhost:10255/stats/summary: dial tcp 127.0.0.1:10255: getsockopt: connection refused` Have anyone got a clear answer from DO about it ? |
@andrewsykim any updates? |
I just ran tests with all versions of DOKS currently supported (1.11.5-do.2, 1.12.3-do.2, and 1.13.1-do.2 as of this writing). In each case, I was able to read metrics properly. Here's what I did (mostly summarizing what was mentioned before in this issue):
The waiting part in the last step is important: It takes 1-2 minutes for metrics to show up. FWIW, I created my Kubernetes clusters in the FRA1 region. Is anyone not able to reproduce a successful setup with my steps outlined above? |
@timoreimann i was running 1.12.3-do.1 i've upgraded to 1.13.1-do.2 and now it works as expected. |
I can confirm this as well. Deleted my original problematic cluster and re-created from scratch. Metrics are working as they should (with the modification to the metrics-server commands). |
Thanks for the feedback, appeciated. 👍 @andrewsykim looks like we can close this issue. |
awesome, thanks everyone! |
hey @andrewsykim i'm running 1.13.1-do.2 and seeing this issue for prometheus scraping kubelet metrics, should i open a new issue? |
Hey @arussellsaw we're also experiencing the same problem with Prometheus. We abandoned setting up Prometheus temporarily until the issue is resolved. The issue came from the kubelet config, Prometheus required the mode as
The error we received was: |
@arussellsaw @LewisTheobald have you possibly tried the steps I outlined / pointed at above? They have proven to be working, at least for me. |
@timoreimann i get a healthy output from |
@timoreimann ok so it's just prometheus-operator-kubelet metrics that aren't working now, the error resposne i get is |
Facing exactly the same issue. From the prometheus targets (http://localhost:9090/targets) I can see all is green excepted for the monitoring/prometheus-operator-kubelet/0 and monitoring/prometheus-operator-kubelet/1 failing with the error :
And the monitoring/prometheus-operator-node-exporter/0 is also red because of |
Experiencing the exact same issue |
@andrewsykim should we reopen this issue, or perhaps create a new one? |
@arussellsaw Andrew is not working on DO's CCM anymore, I'm taking over. I have a hunch it's not related to CCM (anymore) but rather a configuration or DOKS issue, so I'd say let's refrain from opening a new issue for now. I will keep this on my radar and ask internally what the current status on the matter; will circle back as soon as I have an answer. Thanks. |
Do you mind (to the guy who will open this new issue) to link it to this one please ? (Well ... what's the point of opening a new issue to report the same issue? 🤷♂️) |
Experiencing the same issue, can we reopen this? (prometheus metrics) |
@timoreimann can you explain why metrics-server needs to be run with address type as IP and in insecure mode? Insecure mode surely isn't a value we want running in production clusters? I'm hazy on why metrics-server doesn't work out of the box. |
@maxwedwards I agree with you that running on insecure mode is probably not a good idea for production loads. My goal at the time was to validate which parts of the metrics-server integration worked and which didn't. Previous DOKS image versions had issues that got fixed at some point, but it's possible there's still something missing that prevents metrics-server from functioning properly in a desirable configuration. I haven't had time to further investigate the problem myself. However, I'm not convinced it is something that CCM is responsible for or can help fixing. That's why I'm hesitant to reopen the issue and give the (false) impression that CCM maintainers could work on a fix. (If anyone has a lead that CCM is, in fact, part of the problem or solution, I'd be more than happy to hear and talk about it.) I encourage affected users to submit a question at DO's Q&A platform and/or file a support ticket at DO so that we can get more people to look at the problem. |
@timoreimann my naive understanding of why it works is because dns using the node's name isn't working on the cluster. We switch to IP addressing to get round the fact that DNS is not working correctly but then the certificates on the cluster are set up with hostnames and not with ip addresses so we then need to switch off checking tls certs. Is DNS within the cluster not part of CCM? A cluster with basic metrics on CPU and Memory usage, set up correctly, has to be part of minimum offering on a hosted k8s service? Imagine an Ubuntu image where top or ps doesn't work without adding insecure hacks? I want to run my business on this! Not getting at you personally, just think it shouldn't be glossed over. |
@maxwedwards totally agree that basic metrics collection in a secure way is a must-have. Apologies if this came across us "we don't care about this", we genuinely do. What I was trying to express is that CCM is likely not the place where the fix should (maybe even can) happen: the project's primary purpose is to implement the cloud provider interface that is defined by upstream Kubernetes. I checked again but don't see a way to hook into host name resolutions. We could presumably hack it into the project, but it might not be the right place to do so. Regardless, I have filed an internal ticket to track progress on the matter and did some initial investigations myself that confirm your findings. WiIl keep you posted in this issue since it has become a place to go to for most users that ran into the problem. By the way, we also have something else in the pipeline at DO to improve on the observability front which we intend to release not too far in the future. It only touches partially on the subject discussed here though, proper metrics-server integration still is a given to support features that built on top of it (like autoscaling). Sorry again in case I have sent the wrong message. Appreciate the feedback! |
here is the solution for it, you have to use --kubelet-insecure-tls and --kubelet-preferred-address-types=InternalIP for connection refused issue. here is my yml of metrics server (metrics-server-deployment.yaml) apiVersion: v1
|
@ssprasad100 hey! can you add your yaml as gist? |
Hi @timoreimann This issue has been closed but I believe this is still not fixed, as the solution proposed is to NOT use TLS: Thanks |
@Simwar and others: we just opened up a new repository to track more general feature requests and bug reports related to DOKS (but not specific to any other of our repos, like this one). I created digitalocean/DOKS#2 to address the issue around metrics-server not supported with TLS on DOKS. Please continue discussions on the new issue. Thanks! |
facing the same issue only for pods on newly created 1.19.3 cluster... tried every single possibility mentioned above but still no success. any other suggestions? |
@m-usmanayub sounds like you are affected by kubernetes/kubernetes#94281. We're in the process of shipping a 1.19 update that comes with Docker 19.03 where the problem is apparently fixed. |
Sounds great. Thanks for the update and reference link |
What version will this be @timoreimann ? 1.19.4-do.1 ? |
@kyranb it should be 1.19.3-do.2. I'll post again once the release is out. |
1.19.3-do.2 was just released and should fix the problem. Please report back if that's not the case. Sorry for the inconveniences! |
@timoreimann just started the upgrade and can confirm that this is now fixed |
@WyriHaximus thanks for confirming! 💙 |
@timoreimann you're welcome, checked my clusters status page a few minutes before you posted and started the upgrade. Was kinda hoping to beat you to it, but more glad that this is fix 🎉 |
I'm had the same problem. Here as I solved by using metrics-server chart from bitname repository. helm repo add bitnami https://charts.bitnami.com/bitnami
helm template metrics-server bitnami/metrics-server --values metrics-server.yaml -n kube-system #metrics-server.yaml
apiService:
create: true # this solves the permission problem
extraArgs:
kubelet-preferred-address-types: InternalIP |
Hi this is still biting me on 1,22. I have to change the endpoints to be |
Hello,
I raised an issue last night for this, but since I've had more time to sleep, wanted to raise it again and provide some more information.
I currently have a Digital Ocean Managed Kubernetes Cluster. I have some applications deployed and running on it.
I have configured Horizontal Pod Autoscaling for one of my deployments, but when running the
kubectl get hpa
command, I noticed the following in my output ( in the targets column):I identified this was because I did not have either heapster or metrics-server running on my cluster. So went to install it as per the instructions on https://github.com/kubernetes-incubator/metrics-server
metrics-server successfully installs, and is running in the kube-system namespace:
However, I am still getting no metrics.
Running
kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
reveals:With the latter part of the output being of interest
message: 'no response from https://10.245.219.253:443: Get https://10.245.219.253:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)'
I believe the above error message means that the kube-apiserver cannot speak to the metrics-server service. I believe this is due to the specifics of how the Digital Ocean Kubernetes master works.
I've performed some other general validation:
Service is configured:
metrics-server is up and running:
Another customer has reported similar things: https://www.digitalocean.com/community/questions/cannot-get-kubernetes-horizonal-pod-autoscaler-or-metrics-server-working
The text was updated successfully, but these errors were encountered: