-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes PreStop hook failing to deregister ACLs and service instances #540
Comments
This also happens in an AWS EKS cluster |
Hi! Are either of you able to confirm if the issue still persists using the latest release (consul 1.10, consul-k8s 0.26, consul-helm 0.32)? We've done some pretty major changes to how services are managed. If you're able to try, please do have a look at the changelog first! |
Upon further investigation, we think the service de-registration will be fixed in consul-helm 0.32.0 with Consul 1.10 but we are still leaking the ACL tokens. |
So I did a little more checking and it seems that the
|
@pedrohdz you're right. we do have something setup in our backlog to get the connect injector to also delete the token once the service has been de-registered. Thanks so much for bringing this up. |
@thisisnotashwin |
@pedrohdz i hear ya bud. I dont want to make specific time estimates but we will try and get this prioritized for our next release. |
…istered (#571) Fixes #540 * Modify endpoints controller to delete ACL tokens for each service instance that it deregisters * Remove TLS+ACLs table tests from endpoints controller tests. These tests were testing that endpoints controller works with a client configured to have TLS and ACLs. I thought this test was not necessary because there isn't any code in the controller that behaves differently if the consul client is configured with any of those and as a result there's no way these tests could fail. The tests testing to the new ACL logic are there but they are only testing the logic that was added and configure test agent to accommodate for that. * Create test package under helper and move GenerateServerCerts function from subcommand/common there because it was used outside of subcommand. * Create a helper test function to set up auth methods and refactor existing connect-init command tests to use that function. * Minor editing fixes of comments etc.
Overview of the Issue
Consul Connect Inject created ACLs not being deleted after a Kubernetes pods is deleted. It seems that the issue is that the
envoy-sidecar
preStop
hook is failing. As far as I can tell neither theconsul services deregister
orconsul logout
are executed.The service instance in Consul is eventually gets deleted. The
consul-connect-injector-webhook
pod identifies the missing pod and deletes it, but the ACL remains.Looking at the Kubernetes events with
kubectl get events --all-namespaces -w
shows:This is a bit of a side note... If
terminationGracePeriodSeconds
is not long enough on the pod you get a137
exit code.K8s event error if terminationGracePeriodSeconds not set
Execing into the
envoy-sidecar
container and running thepreStop
commands manually seems to work just fine, both the service instance and the ACL are deleted, though you start getting a bunch of other errors aboutagent.client: RPC failed to server: method=Catalog.Deregister server=10.31.236.71:8300 error="rpc error making call: ACL not found"
which I won't get to here.The
preStop
hook in question is:Reproduction Steps
I created a test cluster based off of the Secure Consul and Registered Services on Kubernetes tutorial and reproduced it there in order to keep things as simple as possible.
Steps to reproduce this issue, eg:
Create your gossip key
kubectl create secret generic consul-gossip-encryption-key --from-literal=key=$(consul keygen)
.Create a cluster via Helm with
helm install -f helm-consul-h0-values.yaml demo-0 hashicorp/consul --wait
:Create the namespace:
kubectl create namespace pedher-server-delete
Deploy the test service with
kubectl --namespace=pedher-server-delete apply -f static-server.yaml
:Watch the events on the K8s cluster in a separate terminal:
kubectl get events --all-namespaces -w
Delete the pod with
kubectl --namespace=pedher-server-delete delete pods -l 'app=static-server'
and wait for the error to pop up in the K8s events. Might take a minute.Consul info for both Client and Server
This output is suspect...
Client info
Server info
Operating system and Environment details
Running on Azure AKS.
Log Fragments
Connect injector logs via
kubectl logs -l 'app=consul,component=connect-injector' -f
:Consul server logs via
kubectl logs -l 'app=consul,component=server' -f
: Nothing in the logsConsul client logs via
kubectl logs -l 'app=consul,component=client' -f
:Kubernetes events
kubectl get events --all-namespaces -w
:The text was updated successfully, but these errors were encountered: