Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container hcloud-csi-driver hangs on init #143

Closed
nnedkov opened this issue Aug 26, 2020 · 9 comments
Closed

Container hcloud-csi-driver hangs on init #143

nnedkov opened this issue Aug 26, 2020 · 9 comments

Comments

@nnedkov
Copy link

nnedkov commented Aug 26, 2020

Kubernetes Version: v1.18.3

After I deploy csi with the command:
kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/v1.4.0/deploy/kubernetes/hcloud-csi.yml
the container hcloud-csi-driver from both StatefulSet/hcloud-csi-controller and DaemonSet/hcloud-csi-node get stuck in an GET HTTP request to the hcloud API (most probably endpoint: https://api.hetzner.cloud/v1/servers/xxx).

The logs I get are:
level=debug ts=2020-08-26T10:33:55.096609739Z msg="getting instance id from metadata service" level=debug ts=2020-08-26T10:33:55.098214893Z msg="fetching server"
and then after hanging for around 20 seconds the containers restart.

The bizarre part is that this occurs only on the pods scheduled on my worker nodes. The pod on the master node continues execution normally. Any ideas are highly appreciated.

PS: Big shoutout for the growing Hetzner support for Kubernetes. Really nice work!

@nnexai
Copy link

nnexai commented Aug 29, 2020

same for me - i think this was working like 2 or 3 weeks ago

@nazarkulyk
Copy link

Check all your cluster nodes (and pods) has access to hetzner api host.

@ibotty
Copy link

ibotty commented Sep 28, 2020

See
https://docs.openshift.com/container-platform/4.5/networking/understanding-networking.html

The pod network does not have access to link-local addresses (which is a good thing, I'd say). Reading the source (hcloud-csi/cmd/driver/main.go) set the queried information using environment variables. Unfortunately I have no good idea on how to do that for daemonsets (hcloud-csi-node).

@ibotty
Copy link

ibotty commented Sep 28, 2020

As indicated in the bug report, it works when using the host network. I had to configure the metric and healthz paths so they don't conflict with each other though.

--- a/hcloud-csi/deploy/kubernetes/hcloud-csi.yml
+++ b/hcloud-csi/deploy/kubernetes/hcloud-csi.yml
@@ -153,7 +153,7 @@ spec:
             - name: CSI_ENDPOINT
               value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
             - name: METRICS_ENDPOINT
-              value: 0.0.0.0:9189
+              value: 0.0.0.0:19189
             - name: HCLOUD_TOKEN
               valueFrom:
                 secretKeyRef:
@@ -163,10 +163,10 @@ spec:
             - name: socket-dir
               mountPath: /var/lib/csi/sockets/pluginproxy/
           ports:
-            - containerPort: 9189
+            - containerPort: 19189
               name: metrics
             - name: healthz
-              containerPort: 9808
+              containerPort: 19808
               protocol: TCP
           livenessProbe:
             failureThreshold: 5
@@ -186,12 +186,14 @@ spec:
           image: quay.io/k8scsi/livenessprobe:v1.1.0
           args:
             - --csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
+            - --health-port=19808
           volumeMounts:
             - mountPath: /var/lib/csi/sockets/pluginproxy/
               name: socket-dir
       volumes:
         - name: socket-dir
           emptyDir: {}
+      hostNetwork: true
 ---
 kind: DaemonSet
 apiVersion: apps/v1
@@ -299,6 +301,7 @@ spec:
           hostPath:
             path: /dev
             type: Directory
+      hostNetwork: true
 ---
 apiVersion: v1
 kind: Service

@ibotty
Copy link

ibotty commented Sep 28, 2020

There should be a way to not use host networking on these platforms though.

@LKaemmerling
Copy link
Member

As i wrote on #152 you should never use the host network as this might open a "window" for an attacker.

The problem is that there might be something wrong with your network configuration or firewall.

@ibotty
Copy link

ibotty commented Oct 19, 2020

Please reopen.

Of course these pods should not ever run with host networking. That's not what this issue is about. The problem is that in some Kubernetes distributions (Openshift/OKD) not-host-networked pods cannot connect to link-local addresses (e.g. 169.254.169.254). That needs a solution.

Reading the source, it looks possible to use downward API to export spec.nodeName as env variable KUBE_NODE_NAME in the daemonset. I will try to test that this week.

https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/#the-downward-api

@LKaemmerling
Copy link
Member

@ibotty you can update to the latest version, there the nodeName from spec.nodeName is the default way.

@ibotty
Copy link

ibotty commented Oct 19, 2020

Oh, great! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants