-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] starting container process caused "exec: \"/azure-keyvault/azure-keyvault-env\": stat /azure-keyvault/azure-keyvault-env: no such file or directory" #42
Comments
Hi @howardjones - I have a theory what's happening, but I belive it should only be a probelm if a container crashes on first run. Could you try and use the deployment as detailed in this tutorial? https://akv2k8s.io/tutorials/env-injection/1-secret/ Specifically image v. 2.0.1, specify args and env: containers:
- name: akv2k8s-env-test
image: spvest/akv2k8s-env-test:2.0.1
args: ["TEST_SECRET"]
env:
- name: TEST_SECRET
value: "secret-inject@azurekeyvault" # ref to akvs |
@howardjones My theory is that when a Pod starts up, the init-container executes, copying env-injector executable and creds to a shared volume. When the original program start up, these sensitive files are deleted (because they are sensitive). However, if the pod crashed and tries to restart, the init-container will not run again (by design) and the sensitive files which is now needed to start the contianer, are no longer there. To fix a crashed container, the pod needs to be deleted, which will make the init-container run again. This is of course not the way we would like it and are looking into options - have any? |
To themporary work around this issue, you could try a set the env var |
Hey, just encountered this issue. Tried the env variable suggested by @torresdal but still getting the same error. Here's a screenshot. The main container might be crashing on boot, but atm, I'm not sure because the only info I have available is regarding this particular error. FWIW, the main container is most likely failing on boot. |
Pinning version to 2.0.1 has successfully started the test container with the right AKV value. |
Just tested with spvest/akv2k8s-env-test:2.0.1, to make sure everything was working at least from a injector point of view. Got the same error.
|
@hfgbarrigas which container image version is the init-container running? |
|
There must be something off here. Both of you @hfgbarrigas and @howardjones are running the same scenario, one fails the other succeed... |
I need to dig down deep and find a good solution for this and a test harness. Sorry about the inconvenience. Will update here as soon as I can. |
Indeed, if my test deployment worked from a secret perspective I would've been satisfied. At least the error is consistent. Looking forward and thank you for the prompt feedback. |
Hello, |
@torresdal with version 1.0.2-beta1:
|
Agreed. With |
Thanks @howardjones & @hfgbarrigas - the only difference between the two versions is that the files will only be deleted IF your application starts successfully. Should your app crash for some reason after this though - the pod will not be able to recover. This is not acceptable for this project and we will keep working to find a solution. |
@torresdal what's the official workaround for this currently? Not sure how to deal with pods that can never recover without manual intervention. |
We have a few ideas of how to solve this now, but it will require more time - so I see no other solution right now other than disable the functionality for deleting sensitive files in a patch (on the way - will be 1.0.2). As for the solution, we're looking into how we can exchange AKV auth tokens securely with our webhook instead of storing creds on a in-memory disk. Other suggestions are welcome! |
Can anyone verify that issues described here is solved by no longer deleting sensitive files? Version 1.0.2-beta.2 => https://github.com/SparebankenVest/azure-key-vault-to-kubernetes/releases/tag/1.0.2-beta.2 Will release 1.0.2 as soon as I have confirmation. |
@torresdal All good from my side using 1.0.2-beta.1 and 1.0.2-beta.2. Sensible files will persist throughout the pod's lifetime in the in memory volume right? |
@hfgbarrigas yes - that is right. |
The official release is now out: https://github.com/SparebankenVest/azure-key-vault-to-kubernetes/releases/tag/1.0.2 Currently working on a better and more secure solution for handling default credentials. |
Two fixes are implemented to prevent the issues identified in this thread. Detailed description in Release 1.1.0-beta.4. Any help verifying before final release would be greatly appreciated! |
I'll test it asap. |
I could not get things working on Friday, but it wasn't clear what the
issue was. No errors in either akv2k8s component, but my pod didn't get its
secrets. Should be able to look into it more tomorrow.
…On Mon, 23 Mar 2020, 09:27 Hugo Barrigas, ***@***.***> wrote:
I'll test it asap.
Regarding the implementation, can you share a bit more light on the *gets
an oauth token to access AKV from an endpoint that is protected via client
certificate* ?
Thank you.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG6YJOQENX6X2GCNTVKUADRI4TP5ANCNFSM4LFRSJHA>
.
|
@hfgbarrigas @howardjones fixed several issues during the weekend. You should now point images to 1.1.0-beta.28 and helm chart version 1.1.0-beta.12:
Update: I will create a new official Beta release shortly, with updated Helm chart. |
@hfgbarrigas regarding "gets an oauth token to access AKV from an endpoint that is protected via client certificate" We have implemented a auth-service that will issue Azure jwt tokens valid for AKV resources only. The service are using credentials given to the auth-service (default is AKS credentials). The main difference is:
|
Hi @torresdal. My test application is working happily with 1.1.0-beta.28 and helm chart 1.1.0-beta.12 as described above. I have not yet tested any failure (pod restart) scenarios though. |
Hmm I have never encountered this problem before with a cluster running the env-injector version 0.1.15 for multiple months, but now that I have upgraded 2 clusters to the version 1.0.2, I have had multiple incidents of CrashLoopBackOff and it seems that this is happening when Kured restarts nodes for OS updates. Could this be due to the fact that there does not seem to be any readiness probe for the env injector in v 1.0.2? So some pods would try contact a non-ready env injector which fails and then pod is stuck with non properly injected credentials and in a restart loop? |
Did some more testing on a 2 node cluster with version 1.0.2 and around 12 pods using env injector for their secrets. If I drain one of the node, there seems to be a fairly high likelihood that 1 or 2 of the pods will start with their secrets not injected, fail and then enter a CrashLoopBackOff cycle. I then proceeded to go back to version 0.1.4 of the env injector chart (App version : 0.1.15) and did the same thing for a couple of tests and so far I cannot seem to get pods into CrashLoopBackOff cycle. They all get their secrets injected properly. Finally, I did a quick test with 1.1.0-beta.28 and helm chart 1.1.0-beta.12, but having some problems getting secret injection to work at all with this version. Will look into it more tomorrow. |
Thanks for taking time to do thorough testing @jemag. We're in the middle of fixing and testing this issue and aim to have a working version in Beta tomorrow. I would advice you not to spend time on the current beta, as we have also seen issues similar to yours, until the new beta is out tomorrow. This part of the env-injector turned out to be more error prone than we would have hoped, but the good news is we feel confident we have a much more stable implementation now. I'll update here as soon as the beta is out. Thanks! |
Is there a more current RC, or is 1.1.0-beta.28 still my best bet? (starting up a new cluster). |
With the latest github code, I'm hitting
Which looks like webhook_auth_service is blank. I'm using the last published helm charts though. Is there something new that I need to set for the new auth service? |
@howardjones sorry for the long wait. We had to pause dev on this for a bit (we have a bank to run also 😄), but are back on this now. Give us a few days to clean up a few things and get a new release out. Thanks. |
This is now implemented and working correctly as far as we've managed to test in our clusters. Any help verifying this would be great: #115 |
Testing by multiple parties show this is working as expected. Closing. |
Note: Make sure to check out known issues (https://github.com/sparebankenvest/azure-key-vault-to-kubernetes#known-issues) before submitting
Describe the bug
Using environment injection with either the supplied test image or my own results in the container not starting, apparently because the expected volume mounts are not actually created.
AKS with kubernetes 1.17.0
To Reproduce
Steps to reproduce the behavior:
Helm installation as per manual.
Expected behavior
SP_SECRET environment variable defined with value from AKV, inside a running container.
Logs
If applicable, add logs to help explain your problem.
The env-injector pod seems to think it did its work:
But in the pod description:
The init container does not show any error state (exit code 0). Both containers have the /azure-keyvault/ mount listed.
The logs for the init container simply say that the file was copied:
Additional context
This did work yesterday! Nothing has changed in the AKS cluster. I have redeployed the akv2k8s and application deployments a few times to be sure.
The text was updated successfully, but these errors were encountered: