Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting up a ZooKeeper Cluster,Bootstrap the cluster fail #23

Open
xiaoguanyu opened this issue May 14, 2014 · 25 comments
Open

Setting up a ZooKeeper Cluster,Bootstrap the cluster fail #23

xiaoguanyu opened this issue May 14, 2014 · 25 comments

Comments

@xiaoguanyu
Copy link

Your password is: 123456, you should store this in a safe place, because this is the verification code used to do cleanup
Bootstrapping task 0 of zookeeper on 192.38.11.59(0)
Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: No package found on package server of zookeeper
Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: 2
Starting task 0 of zookeeper on 192.38.11.59(0)
Start task 0 of zookeeper on 192.38.11.59(0) fail: You should bootstrap the job first

@wuzesheng
Copy link
Contributor

This seems that you haven't install the zookeeper package on tank server.

@xiaoguanyu
Copy link
Author

installed the zookeeper package on tank server,but alert a new err
Start task 0 of zookeeper on 10.38.11.59(0) fail: <Fault 60: 'ALREADY_STARTED: zookeeper--dptst--zookeeper'>
......
File "/usr/local/lib/python2.7/socket.py", line 571, in create_connection
raise err
socket.error: [Errno 111] Connection refused

@wuzesheng
Copy link
Contributor

Can you post the detailed stack trace?

@xiaoguanyu
Copy link
Author

2014-05-14 13:26:49 You should set a bootstrap password, it will be requried when you do cleanup
Set a password manually? (y/n) y
Please input your password:
2014-05-14 13:26:52 Your password is: 123456, you should store this in a safe place, because this is the verification code used to do cleanup
2014-05-14 13:26:52 Bootstrapping task 0 of zookeeper on 10.38.11.59(0)
2014-05-14 13:26:53 Bootstrap task 0 of zookeeper on 10.38.11.59(0) success
2014-05-14 13:26:53 Starting task 0 of zookeeper on 10.38.11.59(0)
2014-05-14 13:26:53 Start task 0 of zookeeper on 10.38.11.59(0) fail: <Fault 60: 'ALREADY_STARTED: zookeeper--dptst--zookeeper'>
Traceback (most recent call last):
File "/usr/local/test/minos/client/deploy.py", line 284, in
main()
File "/usr/local/test/minos/client/deploy.py", line 281, in main
return args.handler(args)
File "/usr/local/test/minos/client/deploy.py", line 229, in process_command_bootstrap
return deploy_tool.bootstrap(args)
File "/usr/local/test/minos/client/deploy_zookeeper.py", line 154, in bootstrap
bootstrap_job(args, hosts[host_id].ip, "zookeeper", host_id, instance_id, cleanup_token)
File "/usr/local/test/minos/client/deploy_zookeeper.py", line 136, in bootstrap_job
args.zookeeper_config.parse_generated_config_files(args, job_name, host_id, instance_id)
File "/usr/local/test/minos/client/service_config.py", line 665, in parse_generated_config_files
args, self.cluster, self.jobs, current_job, host_id, instance_id))
File "/usr/local/test/minos/client/service_config.py", line 652, in parse_generated_files
file_dict[key] = ServiceConfig.parse_item(args, cluster, jobs, current_job, host_id, instance_id, value)
File "/usr/local/test/minos/client/service_config.py", line 596, in parse_item
new_item.append(callback(args, cluster, jobs, current_job, host_id, instance_id, reg_expr[iter]))
File "/usr/local/test/minos/client/service_config.py", line 255, in get_section_attribute
return get_specific_dir(host.ip, args.service, cluster.name, section, section_instance_id, attribute)
File "/usr/local/test/minos/client/service_config.py", line 183, in get_specific_dir
return supervisor_client.get_available_data_dirs()[0]
File "/usr/local/test/minos/client/supervisor_client.py", line 26, in get_available_data_dirs
self.cluster, self.job)
File "/usr/local/lib/python2.7/xmlrpclib.py", line 1224, in call
return self.__send(self.__name, args)
File "/usr/local/lib/python2.7/xmlrpclib.py", line 1578, in __request
verbose=self.__verbose
File "/usr/local/lib/python2.7/xmlrpclib.py", line 1264, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/local/lib/python2.7/xmlrpclib.py", line 1292, in single_request
self.send_content(h, request_body)
File "/usr/local/lib/python2.7/xmlrpclib.py", line 1439, in send_content
connection.endheaders(request_body)
File "/usr/local/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/usr/local/lib/python2.7/httplib.py", line 829, in _send_output
self.send(msg)
File "/usr/local/lib/python2.7/httplib.py", line 791, in send
self.connect()
File "/usr/local/lib/python2.7/httplib.py", line 772, in connect
self.timeout, self.source_address)
File "/usr/local/lib/python2.7/socket.py", line 571, in create_connection
raise err
socket.error: [Errno 111] Connection refused

@wuzesheng
Copy link
Contributor

This seems that the minos client can't connect to your supervisord
Can you check that whether your supervisord is started normally or not?

@xiaoguanyu
Copy link
Author

My supervisord is started normally,and can view components work status by http://192.169.11.59:9001.

@yehangjun
Copy link
Member

From your error message, the client is trying to connect another ip, check that one?

Bootstrapping task 0 of zookeeper on 192.38.11.59(0)
Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: No package found on package server of zookeeper
Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: 2
Starting task 0 of zookeeper on 192.38.11.59(0)

@xiaoguanyu
Copy link
Author

thank you yehangjun ,the problem is solved,because haven't install the zookeeper package on tank server.
cd minos/client
./deploy install zookeeper dptst

Bootstrapping task 0 of zookeeper on 192.38.11.59(0)
Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: No package found on package server of zookeeper
Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: 2
Starting task 0 of zookeeper on 192.38.11.59(0)

@wuzesheng
Copy link
Contributor

@xiaoguanyu What is the root cause of the 'connection refused' error?

@lvzhaoxing
Copy link

我也遇到了类似的情况:

[root@master client]# ./deploy bootstrap zookeeper dptst
2014-10-15 17:29:57 You should set a bootstrap password, it will be requried when you do cleanup
Set a password manually? (y/n) y
Please input your password: 
2014-10-15 17:30:03 Your password is: ir2014, you should store this in a safe place, because this is the verification code used to do cleanup
2014-10-15 17:30:03 Bootstrapping task 0 of zookeeper on 10.161.156.199(0)
2014-10-15 17:30:07 Bootstrap task 0 of zookeeper on 10.161.156.199(0) success
2014-10-15 17:30:07 Starting task 0 of zookeeper on 10.161.156.199(0)
2014-10-15 17:30:07 Start task 0 of zookeeper on 10.161.156.199(0) success
Traceback (most recent call last):
  File "/root/minos/client/deploy.py", line 288, in <module>
    main()
  File "/root/minos/client/deploy.py", line 285, in main
    return args.handler(args)
  File "/root/minos/client/deploy.py", line 233, in process_command_bootstrap
    return deploy_tool.bootstrap(args)
  File "/root/minos/client/deploy_zookeeper.py", line 154, in bootstrap
    bootstrap_job(args, hosts[host_id].ip, "zookeeper", host_id, instance_id, cleanup_token)
  File "/root/minos/client/deploy_zookeeper.py", line 136, in bootstrap_job
    args.zookeeper_config.parse_generated_config_files(args, job_name, host_id, instance_id)
  File "/root/minos/client/service_config.py", line 693, in parse_generated_config_files
    args, self.service, self.cluster, self.jobs, current_job, host_id, instance_id))
  File "/root/minos/client/service_config.py", line 681, in parse_generated_files
    parsing_service, current_job, host_id, instance_id, value)
  File "/root/minos/client/service_config.py", line 622, in parse_item
    current_job, host_id, instance_id, reg_expr[iter]))
  File "/root/minos/client/service_config.py", line 274, in get_section_attribute
    section, section_instance_id, attribute)
  File "/root/minos/client/service_config.py", line 195, in get_specific_dir
    return supervisor_client.get_available_data_dirs()[0]
  File "/root/minos/client/supervisor_client.py", line 26, in get_available_data_dirs
    self.cluster, self.job)
  File "/usr/local/python2.7/lib/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/usr/local/python2.7/lib/python2.7/xmlrpclib.py", line 1578, in __request
    verbose=self.__verbose
  File "/usr/local/python2.7/lib/python2.7/xmlrpclib.py", line 1264, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/local/python2.7/lib/python2.7/xmlrpclib.py", line 1292, in single_request
    self.send_content(h, request_body)
  File "/usr/local/python2.7/lib/python2.7/xmlrpclib.py", line 1439, in send_content
    connection.endheaders(request_body)
  File "/usr/local/python2.7/lib/python2.7/httplib.py", line 991, in endheaders
    self._send_output(message_body)
  File "/usr/local/python2.7/lib/python2.7/httplib.py", line 844, in _send_output
    self.send(msg)
  File "/usr/local/python2.7/lib/python2.7/httplib.py", line 806, in send
    self.connect()
  File "/usr/local/python2.7/lib/python2.7/httplib.py", line 787, in connect
    self.timeout, self.source_address)
  File "/usr/local/python2.7/lib/python2.7/socket.py", line 571, in create_connection
    raise err
socket.error: [Errno 111] Connection refused

@lvzhaoxing
Copy link

不过,我的tank已经上传了zookeeper的包了。

ID Package Name Revision No. Timestamp Checksum Download
1 zookeeper-3.4.6.tar.gz r12345 20141015-172923 2a9e53f5990dfe0965834a525fbcad226bf93474 Download

@wuzesheng
Copy link
Contributor

看上去,你的第一台布成功了,第二台在连接supervisord的时候没连上,connection refused,应该是对应机器上的supervisord没启来吧,你检查一下?

@lvzhaoxing
Copy link

我部署了3台机器,发现三台的9001都能访问,但是三台的zookeeper的process都启动失败。supervisor页面的内容均如下:

State   Description Name    Action
running
pid 21263, uptime 0:06:29   crashmailbatch-monitor  Restart Stop Clear Log Tail -f
running
pid 21262, uptime 0:06:29   processexit-monitor Restart Stop Clear Log Tail -f
fatal
Exited too quickly (process log may have details)   zookeeper--dptst--zookeeper Start Clear Log Tail -f

PS:9001的页面出来了,supervisord有可能还没启动吗?

@wuzesheng
Copy link
Contributor

9001页面成功的话,应该supervisor就是启动成功了。 process启动失败报的啥错,也是connection refused吗?

@lvzhaoxing
Copy link

恩,./deploy bootstrap zookeeper dptst命令的结果也是connection refused。

@wuzesheng
Copy link
Contributor

你在运行客户端的机器上,wget http://$host:9001 这个页面,看看能不能正常访问

@lvzhaoxing
Copy link

是,我的错,要把所有的机器都先部署上supervisor,再运行./deploy bootstrap zookeeper dptst。现在正常了。
不过新的问题又来了。./deploy show zookeeper dptst 出现错误。

2014-10-16 09:44:01 Showing task 0 of zookeeper on 10.161.156.199(0)
2014-10-16 09:44:01 Task 0 of zookeeper on 10.161.156.199(0) is FATAL
2014-10-16 09:44:01 Showing task 1 of zookeeper on 10.162.20.204(0)
2014-10-16 09:44:01 Task 1 of zookeeper on 10.162.20.204(0) is FATAL
2014-10-16 09:44:01 Showing task 2 of zookeeper on 10.161.131.193(0)
2014-10-16 09:44:01 Task 2 of zookeeper on 10.161.131.193(0) is FATAL

@wuzesheng
Copy link
Contributor

这个是zookeeper没正常起来,你查看一下zookeeper的log吧,看看是什么原因没启动

@lvzhaoxing
Copy link

是/home/root/log/zookeeper/dptst/zookeeper目录吗?都是空目录

[root@slave1 ~]# cd  /home/root/log/zookeeper/dptst/zookeeper
[root@slave1 zookeeper]# ll
总用量 0

@wuzesheng
Copy link
Contributor

到/home/root/app/zookeeper/dptst/zookeeper下, stdout/ 下面有标准输出重定向的文件,看看里面有没有什么错误信息?

@lvzhaoxing
Copy link

问题找到了,应该是java的为问题。
java是我手动安装的,装在/usr/local/jdk1.7.0_67/,/etc/profile里的环境变量也配了,直接运行java -version也正常。
但是查看stdout里的记录,zookeeper还是去/usr/bin/java查找,minos的javahome要去那里配置?

@lvzhaoxing
Copy link

弱弱地再问一个问题,看完pdf和wiki,minos的安装方式是不需要配置ssh免密码登录,全部靠supervisior是吧?

@wuzesheng
Copy link
Contributor

  1. minos是读当前用户的JAVA_HOME环境变量,没有特殊配置
  2. 恩,你的理解是对的,不依赖ssh

@lvzhaoxing
Copy link

实在搞不明白,只好链接过去了:ln -s $JAVA_HOME/bin/java /usr/bin/java
算是马马虎虎搞定了。现在zookeeper正常了

@wuzesheng
Copy link
Contributor

赞,搞起来了就好。
minos中,java_home的获取在start.sh里有,你可以看一下源码。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants