-
Notifications
You must be signed in to change notification settings - Fork 404
Startup failure #278
Comments
happens to me too, on EKS 1.14.9- the pods enter a
|
Any update on this? Have you been able to find the reason for the crash or a workaround? |
@ghostsquad I switched to sops. |
I couldn't reproduce the error on minikube with kubernetes 1.14.9 for both 3.0.0 and 3.1.0. Does it only happen for Kubernetes 1.14.9 on Amazon EKS? Are you able to curl your cluster api endpoint for path |
me too facing this issue... any input in solving this is appreciated..
|
I'll see if I can reproduce in a fresh EKS cluster |
Created a new EKS cluster with kubernetes 1.14.9 and platform version eks.9.
|
@Flydiverny Id love to help but I need some more details. |
Only thing I can think of for now is that for some reason the initialization of the kube client is off. but the kubernetes-client error doesn't really give much to go on, was hoping to be able to reproduce this myself. Do you have any limitations or so where external-secrets is running or mismatching kubernetes versions or something else that might be worth evaluating to try to reproduce the issue? |
@Efrat19 , @Flydiverny , there are two issues...
|
@krao-test got that, Ill take a look on my side too |
I can work around it in our use case (we're not using external-secrets, just stumbled over this ticket) by applying this fix to the kubernetes-client library. Not sure how safe this is (because Would still be nice to know the exact circumstances that cause this problem, and then maybe provide a better solution.
|
|
@krao-test There's no @keweilu I was able to curl into my cluster api endpoint for path /openapi/v2 and /swagger.json |
Same problem here on EKS 1.14.9 with platform version eks.2 and eks.8 and tested with external-secrets 2.2.0 up to 3.1.0 |
I found the last log line of an instance which had started successful a few days ago and was failing at some point.
Now it's not starting anymore with the same error mentioned above. |
I found that this started happening after upgrading linkerd from 2.6.1 to 2.7.0. Downgrading linkerd fixes the issue. I tried this on a new cluster with 1.14.8/eks.9. Maybe this will help as a workaround or for finding a fix. |
That's it you are right @dmduggan! |
Hummm... that might be it, we also upgraded linkerd to 2.7.0 recently, but the pod is not running with a proxy, strange... I'll dig a little deeper on this. |
@mycrEEpy @idobry @dmduggan I dug into this a bit and the underlying issue is that I enabled logging on the EKS apiserver, and I don't see any errors related to this in the logs. I think that I took a look through some of the changes between Linkerd 2.6.1 and 2.7.0 and I nothing jumped out at me as to why the request would be handled differently. Testing with both Linkerd 2.6.0 and 2.7.1, I used
Let me know if any of you are interested in debugging this together and we can set up a video call. |
Strange behavior, I stood up 2 EKS clusters. Both are version 1.15/eks.1 one has been around a few days longer. The older cluster is having issues starting KES. In the brand new one (as of today), KES came right up... Started yet another cluster, and KES couldn't start. Can't seem to find rhyme/reason. |
I'm not a nodejs developer but i managed to get a stack trace of the problem:
Seems like it's crashing at the deprecation of the stream api in swagger-client.js:102:49 while iterating over the endpoints from /openapi/v2 Anyone here to whom this is helpful? |
Might be helpful to have the spec from The error is thrown from here https://github.com/dougwilson/nodejs-depd/blob/master/index.js#L414-L416 so the argument here being So this is when its adding the spec so the error message of Even better if we could see which endpoint causes an issue and maybe see what goes wrong here and propose an update to kubernetes-client or so. @silasbw Any thoughts on what might be causing this? |
@Flydiverny here is the spec: https://gist.github.com/mycrEEpy/aa64cb5e9303870401f3029aa89cb056 I'll see if I can find out which endpoint causes the issue. |
@Flydiverny it starts failing on
|
Also running into this issue after installing in a namespace with Linkerd 2.7.0. |
@Flydiverny Having the same issue here as well. I can confirm uninstalling linkerd, and allowing the service to start, then reinstalling linkerd seems to work, so if you could dig in when you have some time, that would be amazing! |
Dug into this issue a bit as well and seem to have discovered the issue and have a fix. The issue seems to be from the handling of URLs in silasbw/swagger-fluent, a dependency of godaddy/kubernetes-client. The issue starts on the the Endpoint that @mycrEEpy mentioned; the first linkerd endpoint that is found in the array of Endpoints. Here's a patch that seems to fix the issue but I'd like someone else to test before putting in a PR to swagger-fluent. I'm not a js dev and am not very familiar with these individual repos so please recommend any changes. 👍🏻
cc: @silasbw |
Also, you can easily reproduce this issue locally with minikube.
|
To get back to a working state, I downgraded linkerd to 2.6.1, then deleted and reinstalled the 3 apiservices: v1alpha1.tap.linkerd.io v1alpha1.linkerd.io v1alpha2.linkerd.io.
KES version: 3.2.0 |
One of my colleagues dug into this and found that we can address this by adding an operationID field so that We've got a pull request in progress: linkerd/linkerd2#4245 |
@cpretzer is there a workaround in the meanwhile? Or do people experiencing this issue have to wait for the next edge/stable Linkerd release? |
@wmorgan current known workaround is a downgrade to linkerd 2.6.1 |
@mycrEEpy yeah I hate that workaround :) |
I would also prefer something different :/ |
I'm slightly confused from reading through the comments here. Would this affect me if I have Linkerd deployed in my cluster, but external-secrets not being part of the mesh? If so, can someone explain why this would be the case? |
@muenchdo linkerd registers an apiservice without an operationId set, which is perfectly valid but swagger-fluent, a dependency of external-secrets can't handle this case. |
We ran into the same issue here, Linkerd 2.7.0 is the culprit |
exactly @muenchdo , I just discovered that external-secrets did not have linkerd injected, still crashing after downgrade |
Just to clarify a couple things here:
Sorry for not getting this change into Linkerd 2.7.1. I am writing this comment as penance. |
This issue should finally no longer occur on master! :) Reproduction instructions in #278 (comment) now starts as expected when I try to reproduce! 🎉 |
Not certain exactly what changed with 3.2 vs. 3.3, but for anyone else who runs into this, even though most folks in the comment thread are noting an environment with AWS and/or linkerd, I had the identical error on metal with no linkerd, and bumping to 3.3 resolved it. (running kube 1.17, strapped with kubeadm on metal) |
Changes related to this are all upstream silasbw/swagger-fluent#69 and follow up silasbw/swagger-fluent#72 -- KES v3.2 -> v3.3 also (https://github.com/godaddy/kubernetes-external-secrets/blob/master/CHANGELOG.md#330-2020-05-01)
With the new release out I'm closing this! Please update if you still see issues with version 3.3.0 or higher. |
@matoszz the root issue was registering an ApiService in Kubernetes without defining an OperationId, linkerd was just a common application which did exactly that. If you have other applications with the same behaviour you get the same error for KES, no matter where your Kubernetes cluster is running. |
We've experienced this issue as well after upgrading https://github.com/zalando-incubator/kube-metrics-adapter to the latest image. We don't have Any ideas? |
@Flydiverny looks like #576 |
I am stuck. Please help.
Happens on 3.0.0 and 3.1.0
Using Kubernetes 1.14.9 on Amazon EKS
The text was updated successfully, but these errors were encountered: