Skip to content

Commit

Permalink
docs: update csi-node troubleshooting (#1130)
Browse files Browse the repository at this point in the history
  • Loading branch information
timfeirg authored Sep 30, 2024
1 parent 6a6216e commit 792180b
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 8 deletions.
15 changes: 12 additions & 3 deletions docs/en/administration/troubleshooting-cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,18 @@ Above error message shows that the CSI Driver named `csi.juicefs.com` isn't foun
If you used `mount pod` mode, follow these steps to troubleshoot:

* Run `kubectl get csidrivers.storage.k8s.io` and check if `csi.juicefs.com` actually missing, if that is indeed the case, CSI Driver isn't installed at all, head to [Installation](../getting_started.md).
* If `csi.juicefs.com` already exists in the above `csidrivers` list, that means CSI Driver is installed, the problem is with CSI Node.
* [Check if CSI Node is working correctly](./troubleshooting.md#check-csi-node).
* There should be a CSI Node pod on the exact Kubernetes node where the application pod is running, if [scheduling strategy](../guide/resource-optimization.md#csi-node-node-selector) has been configured for the CSI Node DaemonSet, or the node itself is [tainted](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration), CSI Node may be missing on such worker nodes.
* If `csi.juicefs.com` already exists in the above `csidrivers` list, that means CSI Driver is installed, the problem is with CSI Node, check its status:
* Before troubleshooting, navigate to [check CSI Node](./troubleshooting.md#check-csi-node) to see a list of helpful commands;
* A CSI Node pod is expected on the node where the application pod is running, if [scheduling strategy](../guide/resource-optimization.md#csi-node-node-selector) has been configured for the CSI Node DaemonSet, or the node itself is [tainted](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration), CSI Node may be missing on some worker nodes, causing the "driver not found" issue;
* If CSI Node is actually running, look for error in its logs:

```shell
# juicefs-plugin container handls actual CSI Driver work, if it cannot access Kubernetes API, mount pod cannot be created
kubectl logs -n kube-system juicefs-csi-node-xxx juicefs-plugin --tail 100

# node-driver-registrar container is in charge of registering csidriver, if there's been an error, it'll show in logs
kubectl logs -n kube-system juicefs-csi-node-xxx node-driver-registrar --tail 100
```

If you used `sidecar` mode, check if the namespace which application pod running has `juicefs.com/enable-injection=true` label:

Expand Down
21 changes: 16 additions & 5 deletions docs/zh_cn/administration/troubleshooting-cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,25 @@ sidebar_position: 7
kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name csi.juicefs.com not found in the list of registered CSI drivers
```

上方的报错信息表示,名为 `csi.juicefs.com` 的驱动没有找到,请先确认使用的是 mount pod 模式还是 sidecar 模式。
上方的报错信息表示,名为 `csi.juicefs.com` 的驱动没有找到,请先确认使用的是 Mount Pod 模式还是 Sidecar 模式。

若使用的是 mount pod 模式,遵循以下步骤进行排查:

* 运行 `kubectl get csidrivers.storage.k8s.io`,如果输出的中确没有 `csi.juicefs.com` 字样,说明 CSI 驱动并未按照,重新回顾[「安装 JuiceFS CSI 驱动」](../getting_started.md)
* 如果上方的 `csidrivers` 列表中存在 `csi.juicefs.com`,那么说明 CSI 驱动已经安装,问题出在 CSI Node。
* [检查 CSI Node 是否正常运作](./troubleshooting.md#check-csi-node)
* 检查应用 Pod 所在节点,是否正常运行着 CSI Node,如果为 CSI Node 这个 DaemonSet 组件配置了[调度策略](../guide/resource-optimization.md#csi-node-node-selector),或者节点本身存在[「污点」](https://kubernetes.io/zh-cn/docs/concepts/scheduling-eviction/taint-and-toleration),都有可能造成 CSI Node 容器缺失。
* 运行 `kubectl get csidrivers.storage.k8s.io`,如果输出的中确没有 `csi.juicefs.com` 字样,说明 CSI 驱动并未安装,仔细回顾[「安装 JuiceFS CSI 驱动」](../getting_started.md)
* 如果上方的 `csidrivers` 列表中存在 `csi.juicefs.com`,那么说明 CSI 驱动已经安装,问题出在 CSI Node,检查 CSI Node 是否正常运作:
* 排查开始前,可以简单阅读[检查 CSI Node](./troubleshooting.md#check-csi-node),代码示范里有一些快捷命令可供参考;
* 关注应用 Pod 所在节点,检查节点是否正常运行着 CSI Node,如果为 CSI Node 这个 DaemonSet 组件配置了[调度策略](../guide/resource-optimization.md#csi-node-node-selector),或者节点本身存在[「污点」](https://kubernetes.io/zh-cn/docs/concepts/scheduling-eviction/taint-and-toleration),都有可能造成 CSI Node 容器缺失,造成该错误;
* 如果问题节点的 CSI Node 正常运行(处于 Running 状态),核实他的各个容器均没有明显错误日志,比方说:

```shell
# juicefs-plugin 容器负责运行 CSI 驱动的实际工作,如果他访问 Kubernetes API 失败,则会导致 Mount Pod 无法创建
kubectl logs -n kube-system juicefs-csi-node-xxx juicefs-plugin --tail 100

# node-driver-registrar 容器负责注册 csidriver,如果注册过程异常,该容器会报错
kubectl logs -n kube-system juicefs-csi-node-xxx node-driver-registrar --tail 100
```

* 如果以上排查均无结论,则认为 Kubernetes 本身出现了问题,可以尝试重启 kubelet 或者重启系统,如果问题仍得不到解决,需要向 Kubernetes 的管理员或服务提供商寻求帮助。

若使用的是 sidecar 模式,请确认对应的 namespace 有没有打上 JuiceFS sidecar 所需 label(`juicefs.com/enable-injection=true`):

Expand Down

0 comments on commit 792180b

Please sign in to comment.