往hugegraph集群添加新机器一直不成功 #2571

hexuxu110 · 2024-07-08T06:17:15Z

Problem Type (问题类型)

server status (启动/运行异常)

Before submit

我已经确认现有的 Issues 与 FAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

环境hugegraph1.0.0+java11
底层存储rocksdb
集群配置1主6从
想往集群新增一台机器一直不成功

启动机器新机器ip
先执行bin/init-store.sh
后执行bin/start-hugegraph.sh

然后在主节点执行 bin/raft-tools.sh -a g_rocksdb 新机器ip:port

Your Question (问题描述)

大佬们请教个问题往集群新增机器失败问题

环境hugegraph1.0.0+java11
底层存储rocksdb
集群配置1主6从
想往集群新增一台机器一直不成功

启动机器新机器ip
先执行bin/init-store.sh
后执行bin/start-hugegraph.sh

然后在主节点执行 bin/raft-tools.sh -a g_rocksdb 新机器ip:port

主节点报错信息:
2024-07-08 13:18:29.849 [Append-Entries-Thread-Send6] [WARN] c.a.s.j.c.Replicator - Fail to issue RPC to 新机器ip:port, consecutiveErrorTimes=1, error=Status[EINTERNAL<1004>: RPC exception:Invoke timeout when invoke with callback.The address is 新机器ip:port]
2024-07-08 13:18:29.849 [JRaft-Rpc-Closure-Executor-122] [WARN] o.a.h.b.s.r.RaftNode - Replicator meet error: Status[EINTERNAL<1004>: RPC exception:Invoke timeout when invoke with callback.The address is 新机器ip:port]

从节点报错信息:
2024-07-08 13:44:33.708 [task-scheduler-1] [WARN] o.a.h.t.ServerInfoManager - ServerInfo is missing: server-14
2024-07-08 13:44:33.710 [task-scheduler-1] [INFO] o.a.h.t.ServerInfoManager - Init server info: HugeServerInfo(server-14){server_load=0, server_role=WORKER, id=server-14, label=~server, server_update_time=Mon Jul 08 13:44:33 CST 2024, server_max_load=10000}
2024-07-08 13:44:34.711 [task-scheduler-1] [WARN] o.a.h.t.ServerInfoManager - ServerInfo is missing: server-14
2024-07-08 13:44:34.713 [task-scheduler-1] [INFO] o.a.h.t.ServerInfoManager - Init server info: HugeServerInfo(server-14){server_load=0, server_role=WORKER, id=server-14, label=~server, server_update_time=Mon Jul 08 13:44:34 CST 2024, server_max_load=10000}

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

dosubot · 2024-07-08T06:25:42Z

首先建议使用 1.5 的最新版分布式存储, 有专门的同学可以解答问题, 参考 #2265 (旧版的存储已进入封存期不再更新维护)

要将新机器添加到 HugeGraph 集群中，请确保以下几点：

服务可用性：确保 HugeGraph 服务在指定的主机和端口上运行并可访问。可以使用 curl 或浏览器尝试连接服务。
主机详细信息：验证主机名是否正确且可达。可以通过 ping 主机或使用网络诊断工具进行检查。
协议和端口：确保使用正确的协议（HTTP/HTTPS）和端口。默认协议是 HTTP，但如果需要，可以配置为 HTTPS。
信任库配置：如果使用 HTTPS，请确保正确配置了信任库文件和密码。
用户名和密码：如果需要身份验证，请确保正确指定了用户名和密码。
Raft 工具：确保在 Raft 组的主节点上执行 raft-tools.sh 脚本，并使用正确的命令格式添加新节点。

具体步骤

在新机器上启动 HugeGraph 服务：
```
bin/init-store.sh
bin/start-hugegraph.sh
```

在主节点上使用 raft-tools.sh 添加新节点：

bin/raft-tools.sh --add-peer g_rocksdb 新机器ip:port

常见问题及解决方案

身份验证问题：
- 问题：如果未设置 HUGEGRAPH_USERNAME 和 HUGEGRAPH_PASSWORD，脚本将无法进行身份验证。
- 解决方案：确保环境变量 HUGEGRAPH_USERNAME 和 HUGEGRAPH_PASSWORD 设置了正确的凭据。
URL 配置：
- 问题：如果未设置 HUGEGRAPH_URL，脚本将不知道发送请求的地址。
- 解决方案：确保设置了 HUGEGRAPH_URL 环境变量，或者脚本将尝试从 rest-server.properties 文件中读取。
主节点要求：
- 问题：add-peer 命令必须在 Raft 组的主节点上执行。
- 解决方案：确保在 Raft 组的主节点上运行 raft-tools.sh 脚本。
端点格式：
- 问题：提供的端点格式可能不正确。
- 解决方案：确保端点格式正确，如 HugeGraph 服务器所期望的那样。

示例命令

./raft-tools.sh --add-peer g_rocksdb http://new-node-endpoint:8080

此命令将把端点 http://new-node-endpoint:8080 添加到 Raft 组 g_rocksdb 中。

请检查服务的可用性和主机详细信息，以解决问题 ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

hexuxu110 · 2024-07-08T07:46:47Z

首先建议使用 1.5 的最新版分布式存储, 有专门的同学可以解答问题, 参考 #2265 (旧版的存储已进入封存期不再更新维护)

要将新机器添加到 HugeGraph 集群中，请确保以下几点：

服务可用性：确保 HugeGraph 服务在指定的主机和端口上运行并可访问。可以使用 curl 或浏览器尝试连接服务。

主机详细信息：验证主机名是否正确且可达。可以通过 ping 主机或使用网络诊断工具进行检查。

协议和端口：确保使用正确的协议（HTTP/HTTPS）和端口。默认协议是 HTTP，但如果需要，可以配置为 HTTPS。

信任库配置：如果使用 HTTPS，请确保正确配置了信任库文件和密码。

用户名和密码：如果需要身份验证，请确保正确指定了用户名和密码。

Raft 工具：确保在 Raft 组的主节点上执行 raft-tools.sh 脚本，并使用正确的命令格式添加新节点。

具体步骤
在新机器上启动 HugeGraph 服务：
bin/init-store.sh
bin/start-hugegraph.sh
在主节点上使用 raft-tools.sh 添加新节点：
bin/raft-tools.sh --add-peer g_rocksdb 新机器ip:port
常见问题及解决方案

身份验证问题：

问题：如果未设置 HUGEGRAPH_USERNAME 和 HUGEGRAPH_PASSWORD，脚本将无法进行身份验证。

解决方案：确保环境变量 HUGEGRAPH_USERNAME 和 HUGEGRAPH_PASSWORD 设置了正确的凭据。

URL 配置：

问题：如果未设置 HUGEGRAPH_URL，脚本将不知道发送请求的地址。

解决方案：确保设置了 HUGEGRAPH_URL 环境变量，或者脚本将尝试从 rest-server.properties 文件中读取。

主节点要求：

问题：add-peer 命令必须在 Raft 组的主节点上执行。

解决方案：确保在 Raft 组的主节点上运行 raft-tools.sh 脚本。

端点格式：

问题：提供的端点格式可能不正确。

解决方案：确保端点格式正确，如 HugeGraph 服务器所期望的那样。

示例命令
./raft-tools.sh --add-peer g_rocksdb http://new-node-endpoint:8080
此命令将把端点 http://new-node-endpoint:8080 添加到 Raft 组 g_rocksdb 中。

请检查服务的可用性和主机详细信息，以解决问题 [1][2][3].

To continue talking to Dosu, mention @dosu.

并没有解答我的问题

dosubot · 2024-07-08T07:57:19Z

要成功将新机器添加到 HugeGraph 集群中，请确保以下几点：

更新配置：
- 确保新机器的配置文件中包含必要的设置，如 raft.group_peers 和其他角色选举选项。这些设置应与现有集群配置一致。

加载图：

使用 loadGraphs 方法加载新机器上的图。此方法扫描图目录并加载每个图配置。

public void loadGraphs(Map<String, String> graphConfs) {
    for (Map.Entry<String, String> conf : graphConfs.entrySet()) {
        String name = conf.getKey();
        String graphConfPath = conf.getValue();
        HugeFactory.checkGraphName(name, "rest-server.properties");
        try {
            this.loadGraph(name, graphConfPath);
        } catch (Throwable e) {
            LOG.error("Graph '{}' can't be loaded: '{}'", name, graphConfPath, e);
        }
    }
}

启动 RPC 服务器：

确保在新机器上启动 RPC 服务器。这对于集群中节点之间的通信至关重要。

private void startRpcServer() {
    if (!this.rpcServer.enabled()) {
        LOG.info("RpcServer is not enabled, skip starting rpc service");
        return;
    }

    RpcProviderConfig serverConfig = this.rpcServer.config();

    // Start auth rpc service if authenticator enabled
    if (this.authenticator != null) {
        serverConfig.addService(AuthManager.class, this.authenticator.authManager());
    }

    // Start graph rpc service if RPC_REMOTE_URL enabled
    if (this.rpcClient.enabled()) {
        RpcConsumerConfig clientConfig = this.rpcClient.config();

        for (Graph graph : this.graphs.values()) {
            HugeGraph hugegraph = (HugeGraph) graph;
            hugegraph.registerRpcServices(serverConfig, clientConfig);
        }
    }

    try {
        this.rpcServer.exportAll();
    } catch (Throwable e) {
        this.rpcServer.destroy();
        throw e;
    }
}

初始化角色状态机：

如果支持角色选举，请初始化角色状态机以处理集群中的角色分配。

private void initRoleStateMachine() {
    E.checkArgument(this.roleStateMachine == null, "Repeated initialization of role state worker");
    this.globalNodeRoleInfo.supportElection(true);
    this.roleStateMachine = this.authenticator().graph().roleElectionStateMachine();
    StandardRoleListener listener = new StandardRoleListener(TaskManager.instance(), this.globalNodeRoleInfo);
    this.roleStateMachine.start(listener);
}

服务器启动：

调用 serverStarted 方法以完成服务器初始化，并确保新机器被识别为集群的一部分。

private void serverStarted(HugeConfig config) {
    String id = config.get(ServerOptions.SERVER_ID);
    String role = config.get(ServerOptions.SERVER_ROLE);
    E.checkArgument(StringUtils.isNotEmpty(id), "The server name can't be null or empty");
    E.checkArgument(StringUtils.isNotEmpty(role), "The server role can't be null or empty");

    NodeRole nodeRole = NodeRole.valueOf(role.toUpperCase());
    boolean supportRoleElection = !nodeRole.computer() && this.supportRoleElection();
    if (supportRoleElection) {
        // Init any server as Worker role, then do role election
        nodeRole = NodeRole.WORKER;
    }

    this.globalNodeRoleInfo.initNodeId(IdGenerator.of(id));
    this.globalNodeRoleInfo.initNodeRole(nodeRole);

    for (String graph : this.graphs()) {
        HugeGraph hugegraph = this.graph(graph);
        assert hugegraph != null;
        hugegraph.serverStarted(this.globalNodeRoleInfo);
    }

    if (supportRoleElection) {
        this.initRoleStateMachine();
    }
}

使用 Docker 部署（可选）：

如果你更喜欢使用 Docker 部署，可以使用以下命令快速启动一个带有 RocksDB 的 HugeGraph 服务器：
```
docker run -itd --name=graph -p 8080:8080 hugegraph/hugegraph
```

对于更复杂的设置，如使用 Cassandra 作为后端存储，可以使用 docker-compose 管理服务器和 Cassandra。以下是一个示例 docker-compose.yml：

version: "3"
services:
  graph:
    image: hugegraph/hugegraph
    container_name: cas-server
    ports:
      - 8080:8080
    environment:
      hugegraph.backend: cassandra
      hugegraph.serializer: cassandra
      hugegraph.cassandra.host: cas-cassandra
      hugegraph.cassandra.port: 9042
    networks:
      - ca-network
    depends_on:
      - cassandra
    healthcheck:
      test: ["CMD", "bin/gremlin-console.sh", "--" ,"-e", "scripts/remote-connect.groovy"]
      interval: 10s
      timeout: 30s
      retries: 3

  cassandra:
    image: cassandra:4
    container_name: cas-cassandra
    ports:
      - 7000:7000
      - 9042:9042
    security_opt:
      - seccomp:unconfined
    networks:
      - ca-network
    healthcheck:
      test: ["CMD", "cqlsh", "--execute", "describe keyspaces;"]
      interval: 10s
      timeout: 30s
      retries: 5

networks:
  ca-network:

volumes:
  hugegraph-data:

通过遵循这些步骤，你应该能够成功将新机器添加到 HugeGraph 集群中。确保所有配置一致，并且新机器正确集成到集群的角色选举和 RPC 机制中 ^[1]^[2].

To continue talking to Dosu, mention @dosu.

github-actions · 2024-07-23T21:04:35Z

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label

JackyYangPassion · 2024-08-14T04:25:17Z

@hexuxu110
从主节点报错信息看是日志同步超时了。

1. 主从节点是否有网络问题
2. 从节点是否是亚健康，请提供下主从节点配置文件，以及从节点更多的日志，可以打开Debug

github-actions · 2024-08-30T21:04:03Z

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label

dosubot bot added question Further information is requested rocksdb RocksDB backend labels Jul 8, 2024

github-actions bot added the inactive label Jul 23, 2024

github-actions bot closed this as completed Aug 13, 2024

JackyYangPassion reopened this Aug 14, 2024

github-actions bot removed the inactive label Aug 14, 2024

github-actions bot added the inactive label Aug 30, 2024

github-actions bot closed this as completed Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

往hugegraph集群添加新机器一直不成功 #2571

往hugegraph集群添加新机器一直不成功 #2571

hexuxu110 commented Jul 8, 2024

dosubot bot commented Jul 8, 2024 •

edited by imbajin

Loading

hexuxu110 commented Jul 8, 2024

具体步骤

常见问题及解决方案

示例命令

dosubot bot commented Jul 8, 2024

github-actions bot commented Jul 23, 2024

JackyYangPassion commented Aug 14, 2024 •

edited

Loading

github-actions bot commented Aug 30, 2024

往hugegraph集群添加新机器一直不成功 #2571

往hugegraph集群添加新机器一直不成功 #2571

Comments

hexuxu110 commented Jul 8, 2024

Problem Type (问题类型)

Before submit

Environment (环境信息)

Your Question (问题描述)

Vertex/Edge example (问题点 / 边数据举例)

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

dosubot bot commented Jul 8, 2024 • edited by imbajin Loading

具体步骤

常见问题及解决方案

示例命令

hexuxu110 commented Jul 8, 2024

具体步骤

常见问题及解决方案

示例命令

dosubot bot commented Jul 8, 2024

github-actions bot commented Jul 23, 2024

JackyYangPassion commented Aug 14, 2024 • edited Loading

github-actions bot commented Aug 30, 2024

dosubot bot commented Jul 8, 2024 •

edited by imbajin

Loading

JackyYangPassion commented Aug 14, 2024 •

edited

Loading