Container hcloud-csi-driver hangs on init #143

nnedkov · 2020-08-26T12:13:22Z

Kubernetes Version: v1.18.3

After I deploy csi with the command:
kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/v1.4.0/deploy/kubernetes/hcloud-csi.yml
the container hcloud-csi-driver from both StatefulSet/hcloud-csi-controller and DaemonSet/hcloud-csi-node get stuck in an GET HTTP request to the hcloud API (most probably endpoint: https://api.hetzner.cloud/v1/servers/xxx).

The logs I get are:
level=debug ts=2020-08-26T10:33:55.096609739Z msg="getting instance id from metadata service" level=debug ts=2020-08-26T10:33:55.098214893Z msg="fetching server"
and then after hanging for around 20 seconds the containers restart.

The bizarre part is that this occurs only on the pods scheduled on my worker nodes. The pod on the master node continues execution normally. Any ideas are highly appreciated.

PS: Big shoutout for the growing Hetzner support for Kubernetes. Really nice work!

The text was updated successfully, but these errors were encountered:

nnexai · 2020-08-29T12:58:52Z

same for me - i think this was working like 2 or 3 weeks ago

nazarkulyk · 2020-09-11T10:33:02Z

Check all your cluster nodes (and pods) has access to hetzner api host.

ibotty · 2020-09-28T11:03:40Z

See
https://docs.openshift.com/container-platform/4.5/networking/understanding-networking.html

The pod network does not have access to link-local addresses (which is a good thing, I'd say). Reading the source (hcloud-csi/cmd/driver/main.go) set the queried information using environment variables. Unfortunately I have no good idea on how to do that for daemonsets (hcloud-csi-node).

ibotty · 2020-09-28T12:22:40Z

As indicated in the bug report, it works when using the host network. I had to configure the metric and healthz paths so they don't conflict with each other though.

--- a/hcloud-csi/deploy/kubernetes/hcloud-csi.yml
+++ b/hcloud-csi/deploy/kubernetes/hcloud-csi.yml
@@ -153,7 +153,7 @@ spec:
             - name: CSI_ENDPOINT
               value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
             - name: METRICS_ENDPOINT
-              value: 0.0.0.0:9189
+              value: 0.0.0.0:19189
             - name: HCLOUD_TOKEN
               valueFrom:
                 secretKeyRef:
@@ -163,10 +163,10 @@ spec:
             - name: socket-dir
               mountPath: /var/lib/csi/sockets/pluginproxy/
           ports:
-            - containerPort: 9189
+            - containerPort: 19189
               name: metrics
             - name: healthz
-              containerPort: 9808
+              containerPort: 19808
               protocol: TCP
           livenessProbe:
             failureThreshold: 5
@@ -186,12 +186,14 @@ spec:
           image: quay.io/k8scsi/livenessprobe:v1.1.0
           args:
             - --csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
+            - --health-port=19808
           volumeMounts:
             - mountPath: /var/lib/csi/sockets/pluginproxy/
               name: socket-dir
       volumes:
         - name: socket-dir
           emptyDir: {}
+      hostNetwork: true
 ---
 kind: DaemonSet
 apiVersion: apps/v1
@@ -299,6 +301,7 @@ spec:
           hostPath:
             path: /dev
             type: Directory
+      hostNetwork: true
 ---
 apiVersion: v1
 kind: Service

ibotty · 2020-09-28T12:23:51Z

There should be a way to not use host networking on these platforms though.

LKaemmerling · 2020-10-13T09:48:51Z

As i wrote on #152 you should never use the host network as this might open a "window" for an attacker.

The problem is that there might be something wrong with your network configuration or firewall.

ibotty · 2020-10-19T11:52:39Z

Please reopen.

Of course these pods should not ever run with host networking. That's not what this issue is about. The problem is that in some Kubernetes distributions (Openshift/OKD) not-host-networked pods cannot connect to link-local addresses (e.g. 169.254.169.254). That needs a solution.

Reading the source, it looks possible to use downward API to export spec.nodeName as env variable KUBE_NODE_NAME in the daemonset. I will try to test that this week.

https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/#the-downward-api

LKaemmerling · 2020-10-19T12:08:38Z

@ibotty you can update to the latest version, there the nodeName from spec.nodeName is the default way.

ibotty · 2020-10-19T12:14:30Z

Oh, great! Thank you.

JWDobken mentioned this issue Oct 10, 2020

Liveness probe failed: private network connection refused #152

Closed

LKaemmerling closed this as completed Oct 13, 2020

itmwiw mentioned this issue Apr 11, 2023

Daemonset crashloopback in openshift #404

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container hcloud-csi-driver hangs on init #143

Container hcloud-csi-driver hangs on init #143

nnedkov commented Aug 26, 2020

nnexai commented Aug 29, 2020

nazarkulyk commented Sep 11, 2020

ibotty commented Sep 28, 2020 •

edited

Loading

ibotty commented Sep 28, 2020

ibotty commented Sep 28, 2020

LKaemmerling commented Oct 13, 2020

ibotty commented Oct 19, 2020 •

edited

Loading

LKaemmerling commented Oct 19, 2020

ibotty commented Oct 19, 2020

Container hcloud-csi-driver hangs on init #143

Container hcloud-csi-driver hangs on init #143

Comments

nnedkov commented Aug 26, 2020

nnexai commented Aug 29, 2020

nazarkulyk commented Sep 11, 2020

ibotty commented Sep 28, 2020 • edited Loading

ibotty commented Sep 28, 2020

ibotty commented Sep 28, 2020

LKaemmerling commented Oct 13, 2020

ibotty commented Oct 19, 2020 • edited Loading

LKaemmerling commented Oct 19, 2020

ibotty commented Oct 19, 2020

ibotty commented Sep 28, 2020 •

edited

Loading

ibotty commented Oct 19, 2020 •

edited

Loading