-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd container failing to cummincate for some port in different (docker swarm node)subnet,but works in same subnet #10494
Comments
I assume you have verified this is routable? I mean strip all of this back and you have a networking problem at best latency issues. In which case the message is expected? exec into one of these containers and see if you can connect to the other. If yes then you might need to look at tuning etcd to deal with the latencies [1][2]. Checkout your etcd_network_peer_round_trip.. latencies and election metrics. My guess is you are seeing heavy leader elections which destabilize the cluster. But if your going to run etcd cross data centers like this you need to understand how to tune it as the defaults aren't going to cover this use case in general. Focus on [1] https://github.com/etcd-io/etcd/blob/master/Documentation/tuning.md#time-parameters |
Hi Sam, Thanks for responding, changes made and tried thrice with below mentioned values.: --- 10.0.2.103 ping statistics --- --- 10.0.2.99 ping statistics --- etcdc health status from diffrent etcdc container: etcdctl cluster-healthcluster may be unhealthy: failed to list members / # etcdctl cluster-health error logs: [user-docker@f1cloud2201 ~]$ docker logs 638ad344a6c7 |
adding |
Hi Sam , Thanks for the support, |
Glad you figured it out @samar51 |
Please read https://github.com/etcd-io/etcd/blob/master/Documentation/reporting_bugs.md.
While deploying docker etcd container on nodes on the different subnet in different datacentre.getting the below-mentioned error.
2019-02-21 16:52:42.506314 I | raft: b8b747c74aaea686 is starting a new election at term 928
2019-02-21 16:52:42.506344 I | raft: b8b747c74aaea686 became candidate at term 929
2019-02-21 16:52:42.506353 I | raft: b8b747c74aaea686 received MsgVoteResp from b8b747c74aaea686 at term 929
2019-02-21 16:52:42.506361 I | raft: b8b747c74aaea686 [logterm: 1, index: 3] sent MsgVote request to b3504381e8ba3cb at term 929
2019-02-21 16:52:42.506367 I | raft: b8b747c74aaea686 [logterm: 1, index: 3] sent MsgVote request to f572fdfc5cb68406 at term 929
2019-02-21 16:52:43.158372 W | rafthttp: health check for peer b3504381e8ba3cb could not connect: dial tcp 10.0.2.81:2380: i/o timeout
2019-02-21 16:52:43.159658 W | rafthttp: health check for peer f572fdfc5cb68406 could not connect: dial tcp 10.0.2.83:2380: i/o timeout
docker version:
[user-docker@f1cloud2201 ~]$ docker version
Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:10:14 2017
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:12:46 2017
OS/Arch: linux/amd64
Experimental: true
Followed all the pre-requieste of docker swarm port constraints.
telnet and nc also working
etcdc compose file:
version: '3'
services:
etcd01:
image: quay.io/coreos/etcd
ports:
networks:
volumes:
deploy:
placement:
constraints:
replicas: 1
etcd02:
image: quay.io/coreos/etcd
ports:
volumes:
networks:
deploy:
placement:
constraints:
replicas: 1
command:
etcd03:
image: quay.io/coreos/etcd
ports:
volumes:
networks:
deploy:
placement:
constraints:
replicas: 1
volumes:
etcd01:
etcd02:
etcd03:
networks:
dbs1:
external: true
The text was updated successfully, but these errors were encountered: