Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] akv2k8s-ca configmap disappears after some hours never to come back #147

Closed
glenbot opened this issue Dec 4, 2020 · 11 comments
Closed
Labels
bug Something isn't working

Comments

@glenbot
Copy link

glenbot commented Dec 4, 2020

Controller, version: spvest/azure-keyvault-controller:1.1.0
Env-Injector (webhook), version: spvest/azure-keyvault-webhook:1.1.10
Currently i'm in Azure AKS and the nodes are on K8S version v1.19.0

The akv2k8s-ca gets installed into enabled namespaces the first time (helm install) just fine but after some random amount of time it disappears. Logging the env injector shows:

time="2020-12-03T21:16:23Z" level=info msg="Log level set to 'info'"
W1203 21:16:23.870265       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-12-03T21:16:23Z" level=info msg="Creating event broadcaster"
time="2020-12-03T21:16:23Z" level=info msg="Setting up event handlers"
time="2020-12-03T21:16:23Z" level=info msg="Starting CA Bundle Injector controller"
time="2020-12-03T21:16:23Z" level=info msg="Waiting for informer caches to sync"
time="2020-12-03T21:16:23Z" level=info msg="Secret 'akv2k8s-envinjector-webhook-tls' monitored by CA Bundle Injector added. Adding to queue."
time="2020-12-03T21:16:23Z" level=info msg="Namespace 'ccs-decisio' labeled 'enabled' will be monitored by CA Bundle Injector. Adding to queue."
time="2020-12-03T21:16:23Z" level=info msg="Starting workers"
time="2020-12-03T21:16:23Z" level=info msg="Started workers"
time="2020-12-03T21:16:23Z" level=info msg="looping all labelled namespaces looking for config map 'akv2k8s-ca' to update"
time="2020-12-03T21:16:23Z" level=info msg="Successfully synced CA Bundle from updated secret 'akv2k8s/akv2k8s-envinjector-webhook-tls' to all enabled namespaces"
time="2020-12-03T21:16:23Z" level=error msg="error syncing 'ccs-decisio': configmaps \"akv2k8s-ca\" already exists, requeuing"
time="2020-12-03T21:16:24Z" level=info msg="Successfully synced CA Bundle to new namespace 'ccs-decisio'"
W1203 22:43:50.936718       1 reflector.go:302] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: watch of *v1.Secret ended with: too old resource version: 5158613 (5164416)
W1204 13:07:32.213100       1 reflector.go:302] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: watch of *v1.Namespace ended with: very short watch: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Unexpected watch close - watch lasted less than a second and no items received
W1204 13:07:32.213123       1 reflector.go:302] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: watch of *v1.ConfigMap ended with: very short watch: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Unexpected watch close - watch lasted less than a second and no items received
W1204 13:07:32.213174       1 reflector.go:302] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: watch of *v1.Secret ended with: very short watch: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Unexpected watch close - watch lasted less than a second and no items received
E1204 13:07:33.215727       1 reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://10.0.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused
E1204 13:07:33.215760       1 reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Failed to list *v1.Secret: Get https://10.0.0.1:443/api/v1/namespaces/akv2k8s/secrets?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused
E1204 13:07:33.215986       1 reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Failed to list *v1.ConfigMap: Get https://10.0.0.1:443/api/v1/configmaps?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused
E1204 13:07:34.218775       1 reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://10.0.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused
E1204 13:07:34.218924       1 reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Failed to list *v1.Secret: Get https://10.0.0.1:443/api/v1/namespaces/akv2k8s/secrets?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused
E1204 13:07:34.221743       1 reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Failed to list *v1.ConfigMap: Get https://10.0.0.1:443/api/v1/configmaps?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused

Attempts to restart the akv2k8s containers sometimes makes it recreate the CA cert, but most of the time it doesn't. Running k delete po -n akv2k8s $(k get po --no-headers -n akv2k8s | awk '{print $1}') to restart everything works and the CA injector logs look fine:

$ k logs -f akv2k8s-envinjector-ca-bundle-d645ffc9d-ntcvq
time="2020-12-04T13:29:57Z" level=info msg="Log level set to 'info'"
W1204 13:29:57.357216       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-12-04T13:29:57Z" level=info msg="Creating event broadcaster"
time="2020-12-04T13:29:57Z" level=info msg="Setting up event handlers"
time="2020-12-04T13:29:57Z" level=info msg="Starting CA Bundle Injector controller"
time="2020-12-04T13:29:57Z" level=info msg="Waiting for informer caches to sync"
time="2020-12-04T13:29:57Z" level=info msg="Namespace 'ccs-decisio' labeled 'enabled' will be monitored by CA Bundle Injector. Adding to queue."
time="2020-12-04T13:29:57Z" level=info msg="Secret 'akv2k8s-envinjector-webhook-tls' monitored by CA Bundle Injector added. Adding to queue."
time="2020-12-04T13:29:57Z" level=info msg="Starting workers"
time="2020-12-04T13:29:57Z" level=info msg="Started workers"
time="2020-12-04T13:29:57Z" level=info msg="looping all labelled namespaces looking for config map 'akv2k8s-ca' to update"
time="2020-12-04T13:29:57Z" level=info msg="Successfully synced CA Bundle from updated secret 'akv2k8s/akv2k8s-envinjector-webhook-tls' to all enabled namespaces"
time="2020-12-04T13:29:57Z" level=error msg="error syncing 'ccs-decisio': configmaps \"akv2k8s-ca\" already exists, requeuing"
time="2020-12-04T13:29:57Z" level=info msg="Successfully synced CA Bundle to new namespace 'ccs-decisio'"

As you can see it says Successfully synced CA Bundle to new namespace 'ccs-decisio' which tells me it synced it.

Alas:

$ k get cm -n ccs-decisio | grep -i akv2k8s
$ echo $?
1

It's not coming back. Seems the env injector is getting itself into a state where it deletes the configmap and never remakes it. Restarting the pods seems to fix it some of the time. Complete delete and re-install of helm chart will fix it.

Here is me live deleting the injection pod and watching the cm in my namespace:

$ k get cm --watch
NAME                                                    DATA   AGE
akv2k8s-ca                                              1      0s
akv2k8s-ca                                              1      1s
^C

$ k get cm -n ccs-decisio | grep -i akv2k8s
$ echo $?
1

It's there, and then it's gone.

@glenbot glenbot added the bug Something isn't working label Dec 4, 2020
@kremed1
Copy link

kremed1 commented Dec 9, 2020

Thanks for the report. I thought I do something wrong when exactly same behavior happens in our environment.

@dglynn
Copy link

dglynn commented Jan 18, 2021

We are hitting this in one of our qa aks clusters today during a deploy. 5 out of our 6 clusters are ok but the last one fails with this:

Warning Failed 12m (x7 over 13m) kubelet, aks-default-82623051-vmss000000 Error: configmap "akv2k8s-ca" not found

We are unable to deploy any new resources there. It is running AKS v1.19.1 the others are using a combination of AKS v1.17.13 & v1.18.10

We are using:
app_version: 1.1.6
chart_version: 1.1.7

Some more info on this, tainting our module.akv2k8s-env-injector.helm_release.env_injector resource and redeploying fixed this issue.

@sid-motorq
Copy link

Any updates? This has been happening in my AKS cluster

@glenbot
Copy link
Author

glenbot commented Jan 27, 2021

Any updates? This has been happening in my AKS cluster

What I have done temporarily until this gets fixed is export the ca configmap kubectl get cm <cmname> -o yaml --export > file when it shows up and then delete it and manually created it from the same file. After this it stays forever.

@jeromesubs
Copy link

jeromesubs commented Jan 27, 2021

Any updates? This has been happening in my AKS cluster

What I have done temporarily until this gets fixed is export the ca configmap kubectl get cm <cmname> -o yaml --export > file when it shows up and then delete it and manually created it from the same file. After this it stays forever.

Thanks!
We have the same issue, with AKS version 1.19.6

I'll try your fix, hoping it works for me as well!

@torresdal
Copy link
Collaborator

The whole ca configmap sync thing is removed in 1.2-beta (soon to be released) and solved much more elegantly. This probably should be gone in 1.2, but I'll wait for confirmation before closing.

@torresdal
Copy link
Collaborator

As mentioned this is fixed in 1.2 and we have confirmed. Closing.

@sanjaypradeep
Copy link

Recently upgraded to 1.20.05 v too.

What am I facing ?

  1. After upgrading, I can see "akv2k8s-ca file not found" so tried reatarted the cluster for two times. Issue still persist.

Can you guys please help overcome this issue?

Impact : all the working pods are in "CreateConfigError" state.

Any leads, much appreciated.

@lefkatis-hellenic
Copy link

Hello
I just had the same issue. I updated 2 out of my 4 clusters from 1.19.9 to 1.19.11 and this issue occurred.

I am using 1.1.28 chart version and I have no values file. I am using the default values.
helm upgrade --namespace akv2k8s --install --wait akv2k8s spv-charts/akv2k8s --version 1.1.28

I will continue with the update of the remaining 2 clusters from 1.18.10 to 1.19.11

What versions should i use in order not to have this issue?
Should I downgrade in order to bypass this issue?
What chart version should I use?
Should I create a values file?

@dglynn
Copy link

dglynn commented Sep 6, 2021

As mentioned this is fixed in 1.2 and we have confirmed. Closing.

@torresdal just to confirm is this fixed in the app_version or the chart_version 1.2?

@itsabhipaul
Copy link

Just upgraded AKS nodes to 1.20.x and all my pods are pending with the configmap "akv2k8s-ca" not found error.
Can anyone share the steps to uninstall my existing akv2k8s and install the latest , to resolve akv2k8s issue. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants