Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller reports as ready even though it is not able to connect to EC2 instance metadata #548

Closed
invidian opened this issue Aug 26, 2020 · 5 comments · Fixed by #751
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@invidian
Copy link
Member

invidian commented Aug 26, 2020

/kind bug

What happened?

When the controller pod not able to connect to EC2 instance metadata endpoint (for example, when it's blocked by the NetworkPolicy), the deployment still reports the container as a ready for some time, then pod crashes with the following error:

I0826 15:01:28.649833       1 driver.go:62] Driver: ebs.csi.aws.com Version: v0.5.0
panic: EC2 instance metadata is not available

goroutine 1 [running]:
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newControllerService(0xc000191380, 0xc00001e960, 0x0, 0x16)
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/controller.go:76 +0x103
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver(0xc000115f70, 0x3, 0x3, 0xc000062900, 0xd4d780, 0xc000191260)
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:82 +0x3d9
main.main()
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:31 +0x117

What you expected to happen?

ebs-plugin container in controller pod should wait until it's connected to EC2 instance metadata before reporting the readiness to the Kubernetes.

How to reproduce it (as minimally and precisely as possible)?

With Calico as CNI, create the following global network policy to block access to EC2 instance metadata:

apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
  name: block-metadata-access
spec:
  egress:
  - action: Allow
    destination:
      notNets:
      - 169.254.169.254/32
  selector: ""
  types:
  - Egress

Then, deploy the AWS EBS CSI driver as usual.

Anything else we need to know?:

It seems the deployment is currently missing readiness probes all together, so adding them is also needed to resolve this.

Environment

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"archive", BuildDate:"2020-07-01T16:28:46Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:51:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Driver version: 0.5.0
  • Chart version: 0.4.0
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 26, 2020
invidian added a commit to kinvolk/lokomotive that referenced this issue Aug 26, 2020
After we created aws-ebs-csi-driver component, we added a patch to
Lokomotive, which deploys a Global Network Policy, which blocks access
to EC2 Instance Metadata by default for all pods, which ended up
breaking the component functionality.

The issue was not spotted before, as the component does not have
readiness probes defined, which has been reported upstream:
kubernetes-sigs/aws-ebs-csi-driver#548

This commit fixes the component functionality, by adding the
NetworkPolicy object selecting the controller pods, which unblocks all
egress traffic for it, which bypasses the Global Network Policy.

Closes #864

Signed-off-by: Mateusz Gozdek <[email protected]>
@wongma7
Copy link
Contributor

wongma7 commented Oct 16, 2020

There is a livenessprobe https://github.com/kubernetes-csi/livenessprobe https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/deploy/kubernetes/base/controller.yaml#L60 , I am not sure if it can double as a readiness probe but that would probably solve this issue.

@AndyXiangLi
Copy link
Contributor

@invidian I'm not able to reproduce this issue with the latest driver version v0.8.1. I'm using Calico as CNI and when the metadata service get blocked and deploy driver as usual, ebs-plugin container never becomes ready.
Are you able to try with the latest version on your end?

@invidian
Copy link
Member Author

@AndyXiangLi yes, I can reproduce it using v0.8.1. Do notice, that container is in Ready state briefly after creation. Only then it crashes and goes into CrashLoopBackOff state.

@invidian
Copy link
Member Author

invidian commented Feb 2, 2021

Running:

helm upgrade --install aws-ebs-csi-driver --namespace kube-system --wait --atomic --set enableVolumeScheduling=true --set enableVolumeResizing=true --set enableVolumeSnapshot=true aws-ebs-csi-driver/aws-ebs-csi-driver

does reproduce the issue. I would expect Helm to never converge.

$ kgpo
+ kubectl get pods
NAME                                       READY   STATUS             RESTARTS   AGE
calico-kube-controllers-855c8775f9-xd8zm   1/1     Running            0          7h12m
calico-node-hqlt7                          1/1     Running            0          7h12m
calico-node-x79mg                          1/1     Running            1          7h12m
coredns-7d799bc4c8-9pk25                   1/1     Running            0          7h12m
ebs-csi-controller-87d4b79bd-4bnh4         5/6     CrashLoopBackOff   2          69s
ebs-csi-controller-87d4b79bd-svc44         5/6     CrashLoopBackOff   2          69s
ebs-csi-node-5h8bn                         3/3     Running            0          69s
ebs-csi-node-8vss7                         3/3     Running            0          69s
ebs-snapshot-controller-0                  1/1     Running            0          69s

@vdhanan
Copy link
Contributor

vdhanan commented Feb 8, 2021

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants