Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JindoRuntime开启HA模式后,dataset一直处于NotBound,同时日志报node RaftPeerImpl:127.0.1.1:8103:0 can't do pre_vote错误 #119

Open
heipepper opened this issue Feb 7, 2024 · 1 comment

Comments

@heipepper
Copy link

环境版本

  • fluid: 1.0.0
  • jindufx集群: 6.2.0

配置

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: hadoop
spec:
  mounts:
    - mountPoint: oss://k8sfluid/
      options:
        fs.oss.endpoint: ap-southeast-1.oss-dls.aliyuncs.com
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: oss-demo
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: oss-demo
              key: fs.oss.accessKeySecret
      name: hadoop
      path: /
  accessModes:
    - ReadWriteMany
  placement: "Shared"

apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: hadoop
spec:
  replicas: 3
  tieredstore:
    levels:
      - mediumtype: HDD
        path: /data/fluid/hadoop3
        quota: 100Gi
        high: "0.9"
        low: "0.8"
  master:
    replicas: 3
  fuse:
    cleanPolicy: OnDemand

现象

3个master pod都处于Running,dataset处于NotBound状态

root@k8s-master-9c885a0836:/home# kubectl get po |grep hadoop
hadoop-jindofs-master-0                        1/1     Running   0          155m
hadoop-jindofs-master-1                        1/1     Running   0          155m
hadoop-jindofs-master-2                        1/1     Running   0          155m

root@k8s-master-9c885a0836:/home# kubectl get dataset hadoop
NAME     UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE      AGE
hadoop                                                                  NotBound   155m

相关日志

I0207 07:32:39.586294 1 server.cpp:1161] Check out http://localhost.localdomain:18521 in web browser.
I0207 07:32:39.586310 1 JfsxNsMainImpl.cpp:30] Finish start RPC Server at 0.0.0.0:18521
I0207 07:32:39.586311 1 JfsxMainBase.cpp:48] JfsxMainBase run return value True
I0207 07:32:39.586334 1 JfsxMainBase.cpp:107] ioService is being to run
I0207 07:32:44.756633 181 node.cpp:1413] node RaftPeerImpl:127.0.1.1:8103:0 term 1 start pre_vote
W0207 07:32:44.756645 181 node.cpp:1423] node RaftPeerImpl:127.0.1.1:8103:0 can't do pre_vote as it is not in 10.70.73.32:18846:0,10.70.16.55:18846:0,10.70.73.244:18846:0
I0207 07:32:50.414792 177 node.cpp:1413] node RaftPeerImpl:127.0.1.1:8103:0 term 1 start pre_vote
W0207 07:32:50.414805 177 node.cpp:1423] node RaftPeerImpl:127.0.1.1:8103:0 can't do pre_vote as it is not in 10.70.73.32:18846:0,10.70.16.55:18846:0,10.70.73.244:18846:0
I0207 07:32:56.063956 182 node.cpp:1413] node RaftPeerImpl:127.0.1.1:8103:0 term 1 start pre_vote
W0207 07:32:56.063968 182 node.cpp:1423] node RaftPeerImpl:127.0.1.1:8103:0 can't do pre_vote as it is not in 10.70.73.32:18846:0,10.70.16.55:18846:0,10.70.73.244:18846:0
I0207 07:33:01.178079 181 node.cpp:1413] node RaftPeerImpl:127.0.1.1:8103:0 term 1 start pre_vote
W0207 07:33:01.178084 181 node.cpp:1423] node RaftPeerImpl:127.0.1.1:8103:0 can't do pre_vote as it is not in 10.70.73.32:18846:0,10.70.16.55:18846:0,10.70.73.244:18846:0
I0207 07:33:06.951189 177 node.cpp:1413] node RaftPeerImpl:127.0.1.1:8103:0 term 1 start pre_vote
W0207 07:33:06.951194 177 node.cpp:1423] node RaftPeerImpl:127.0.1.1:8103:0 can't do pre_vote as it is not in 10.70.73.32:18846:0,10.70.16.55:18846:0,10.70.73.244:18846:0
I0207 07:33:12.061303 182 node.cpp:1413] node RaftPeerImpl:127.0.1.1:8103:0 term 1 start pre_vote
@heipepper
Copy link
Author

heipepper commented Feb 7, 2024

查看日志后发现一个疑问,为什么是127.0.1.1这个IP,这个IP是从哪里来的
排查后发现jindofs集群默认是使用 hostNetwork 模式,这个模式会将宿主机的 /etc/hosts 文件写入到pod上面,而/etc/hosts刚好有 127.0.1.1 这个IP的解析

root@k8s-master-9c885a0836:/home# cat /etc/hosts
127.0.0.1   localhost
127.0.1.1   localhost.localdomain

将JindoRuntime的网络模式改成 ContainerNetwork 后,HA集群模式恢复正常

apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: hadoop
spec:
  replicas: 3
  networkmode: ContainerNetwork
  tieredstore:
    levels:
      - mediumtype: HDD
        path: /data/fluid/hadoop3
        quota: 100Gi
        high: "0.9"
        low: "0.8"
  master:
    replicas: 3
  fuse:
    cleanPolicy: OnDemand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant