Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用curator 订阅多注册中心(使用zookeeper)的问题讨论 #7897

Closed
fsx379 opened this issue May 28, 2021 · 4 comments
Closed
Labels
type/discussion Everything related with code discussion or question

Comments

@fsx379
Copy link

fsx379 commented May 28, 2021

Environment

  • Dubbo version: 2.6.9
  • Operating System version: all
  • Java version: 1.8

Steps to reproduce this issue

zookeeper做注册中心,默认使用curator做zk客户端。一个consumer 订阅多个注册中心,如果某注册中心由于网络问题连不上,导致应用启动慢,一直卡约1分钟(一个接口卡1分钟,如果接口多,卡的时间会更长)。如果把curator换成 zkclient ,则不会有这个问题。

通过分析,慢的原因分析如下:

AbstractZookeeperClient.create()

 
    public void create(String path, boolean ephemeral) {
        if (!ephemeral) {
            if(persistentExistNodePath.contains(path)){
                return;
            }
            if (checkExists(path)) {
                persistentExistNodePath.add(path);
                return;
            }
        }
        int i = path.lastIndexOf('/');
        if (i > 0) {
            create(path.substring(0, i), false);
        }
        if (ephemeral) {
            createEphemeral(path);
        } else {
            createPersistent(path);
            persistentExistNodePath.add(path);
 
        }
    }

(1)例如服务 /dubbo/com.test.queryServer/consumers/consumer**** 在订阅时,由于网络连不上,checkExists每次都返回false,所以会递归checkExists : /dubbo/com.test.queryServer/consumers/ 、/dubbo/com.test.queryServer 、/dubbo 三次,直到 /dubbo时,调用 createPersistent() 向上抛出异常,本次服务暴露流程才结束,继续后面的应用启动;
(2)而checkExists 即便超时,也不会报错,而是被吞掉了,所以才会返回 false;

public boolean checkExists(String path) {
    try {
        if (client.checkExists().forPath(path) != null) {
            return true;
        }
    } catch (Exception e) {
    }
    return false;
}

(3)而curator 的 checkExists实现,调用栈如下:
1622096424962

(4)client.checkExists().forPath()会查询zk的节点,底层会调用RetryLoop.callWithRetry ,由于curator客户端,默认连接超时5s,且默认重试一次,所以在网络无法连通的情况下,checkExists ()相当于执行两次检查,即需要10s才能返回。
(5)由于需要递归 /dubbo/com.test.queryServer/consumers/ 、/dubbo/com.test.queryServer 、/dubbo ,都会触发 checkExists ()所以至少花费30s。
(6)注册+订阅操作,调用两次AbstractZookeeperClient.create(),所以阻塞60s;

而对于zkclient,在dubbo2.5.7 中,issues = #790 中进行了修复,所以curator 是否也可以考虑调整?

@BurningCN
Copy link
Member

BurningCN commented May 28, 2021

我试了下,在与zk服务连接断开后,调用checkExist方法会等待一会才返回。

我的一个简单想法是可以搞一个变量记录下是否连接的状态,在连接断开后置下状态,然后在类似create/checkExist的方法中先判断一下是否已经断开了。

源码中CuratorZookeeperClient加了相关的监听器client.getConnectionStateListenable().addListener(new CuratorConnectionStateListener(url));可以在这里记录连接断开的状态。

@fsx379
Copy link
Author

fsx379 commented Aug 6, 2021

在CuratorZookeeperClient中,对于create/delete/getchildren/checkExists代码,调用client增删改查前,是否可以加入isConnected() 判断,如果没有连接成功,直接抛出一个异常,然后操作就结束了。
(1)由于抛出了异常,上层 FailbackRegistry 捕获并定期重试create\delete 等;(参考zkClientWrapper写法)
(2)client重试连接,交给curator框架自动重试;

不知道这么改有什么风险

    @Override
    public void createPersistent(String path) {
        try {
            if(!this.isConnected()){
                throw new IllegalStateException("Zookeeper is not connected yet!");
            }
            client.create().forPath(path);
        } catch (NodeExistsException e) {
        } catch (Exception e) {
            throw new IllegalStateException(e.getMessage(), e);
        }
    }

    public boolean checkExists(String path) {
        try {
            if(!this.isConnected()){
                throw new IllegalStateException("Zookeeper is not connected yet!");
            }
            if (client.checkExists().forPath(path) != null) {
                return true;
            }
        } catch (Exception e) {
        }
        return false;
    }

第二,在类初始化过程中,在 client.start() 之后,增加 一个卡点,防止zk异步告知connected之前,就开始做create等操作,导致打印异常日志。

public CuratorZookeeperClient(URL url) {
        super(url);
        try {
            int timeout = url.getParameter(Constants.TIMEOUT_KEY, 5000);
           ........
            client.start();
          //add check
            try{
                listenableFutureTask = ListenableFutureTask.create(new Callable<Boolean>() {
                    @Override
                    public Boolean call(){
                        try {
                            if (client.checkExists().forPath("/dubbo") != null) {
                                return true;
                            }
                        } catch (Exception e) {
                        }
                        return false;
                    }
                });
                listenableFutureTask.get(timeout, TimeUnit.MILLISECONDS);
            }catch (Exception e){}
        } catch (Exception e) {
            throw new IllegalStateException(e.getMessage(), e);
        }
    }

@fsx379 fsx379 changed the title 使用curator 订阅多注册中心(使用zookeeper)的问题反馈 使用curator 订阅多注册中心(使用zookeeper)的问题讨论 Aug 6, 2021
@fsx379
Copy link
Author

fsx379 commented Aug 16, 2021

在public CuratorZookeeperClient(URL url){} 初始化最后,同时添加了 一个检查点, 等待 timeout 如果还没有结果,再继续走。代码见上文。

@CrazyHZM CrazyHZM added type/discussion Everything related with code discussion or question version/2.6.x labels Sep 26, 2021
@CrazyHZM
Copy link
Member

CrazyHZM commented Nov 7, 2021

Try it with the latest version, if you still have problems, you can reopen the issue

@CrazyHZM CrazyHZM closed this as completed Nov 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/discussion Everything related with code discussion or question
Projects
None yet
Development

No branches or pull requests

5 participants
@AlbumenJ @CrazyHZM @fsx379 @BurningCN and others