Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gce] instances don't see each other #13459

Closed
dadoonet opened this issue Sep 10, 2015 · 11 comments
Closed

[gce] instances don't see each other #13459

dadoonet opened this issue Sep 10, 2015 · 11 comments
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs help wanted adoptme

Comments

@dadoonet
Copy link
Member

From @yaraju on July 8, 2015 12:31

I'm trying out the GCE plugin on ES 1.6.0 with plugin version 2.6.0.

I'm unable to get multicast autodiscovery to work at all.

Here is my startup script for the instances. (Includes elasticsearch.yml config)

And here is the log info when I switch discovery logging to "TRACE":
(NOTE: Logs are of 2nd run where I made a fresh GCloud project with ID: es-cloud1000)

[2015-07-08 12:13:45,017][INFO ][node                     ] [Jonathan "John" Garrett] version[1.6.0], pid[8724], bu
ild[cdd3ac4/2015-06-09T13:36:34Z]
[2015-07-08 12:13:45,018][INFO ][node                     ] [Jonathan "John" Garrett] initializing ...
[2015-07-08 12:13:45,055][INFO ][plugins                  ] [Jonathan "John" Garrett] loaded [marvel, cloud-gce], sites [marvel, head]
[2015-07-08 12:13:45,113][INFO ][env                      ] [Jonathan "John" Garrett] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [7.8gb], net total_space [9.8gb], types [ext4]
[2015-07-08 12:13:48,138][DEBUG][discovery.zen.elect      ] [Jonathan "John" Garrett] using minimum_master_nodes [-1]
[2015-07-08 12:13:48,144][DEBUG][discovery.zen.ping.multicast] [Jonathan "John" Garrett] using group [224.2.2.4], with port [54328], ttl [3], and address [null]
[2015-07-08 12:13:48,148][DEBUG][discovery.zen.ping.unicast] [Jonathan "John" Garrett] using initial hosts [], with concurrent_connects [10]
[2015-07-08 12:13:48,149][DEBUG][discovery.gce            ] [Jonathan "John" Garrett] using ping.timeout [3s], join.timeout [1m], master_election.filter_client [true], master_election.filter_data [false]
[2015-07-08 12:13:48,151][DEBUG][discovery.zen.fd         ] [Jonathan "John" Garrett] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2015-07-08 12:13:48,153][DEBUG][discovery.zen.fd         ] [Jonathan "John" Garrett] [node  ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2015-07-08 12:13:49,196][INFO ][node                     ] [Jonathan "John" Garrett] initialized
[2015-07-08 12:13:49,196][INFO ][node                     ] [Jonathan "John" Garrett] starting ...
[2015-07-08 12:13:49,276][INFO ][transport                ] [Jonathan "John" Garrett] bound_address {inet[/0.0.0.0:
9300]}, publish_address {inet[/10.240.180.235:9300]}
[2015-07-08 12:13:49,295][INFO ][discovery                ] [Jonathan "John" Garrett] dummy/jF69Ngu7Tyu4nYCcsec81g
[2015-07-08 12:13:49,299][TRACE][discovery.gce            ] [Jonathan "John" Garrett] starting to ping
[2015-07-08 12:13:49,308][TRACE][discovery.zen.ping.multicast] [Jonathan "John" Garrett] [1] sending ping request
[2015-07-08 12:13:49,312][TRACE][discovery.zen.ping.unicast] [Jonathan "John" Garrett] [1] connecting to [Jonathan 
"John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]
[2015-07-08 12:13:49,344][TRACE][discovery.zen.ping.unicast] [Jonathan "John" Garrett] [1] connected to [Jonathan "
John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]
[2015-07-08 12:13:49,345][TRACE][discovery.zen.ping.unicast] [Jonathan "John" Garrett] [1] sending to [Jonathan "Jo
hn" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]
[2015-07-08 12:13:49,372][TRACE][discovery.zen.ping.unicast] [Jonathan "John" Garrett] [1] received response from [
Jonathan "John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]: [pi
ng_response{node [[Jonathan "John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.
180.235:9300]]], id[1], master [null], hasJoinedOnce [false], cluster_name[dummy]}, ping_response{node [[Jonathan "
John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]], id[2], maste
r [null], hasJoinedOnce [false], cluster_name[dummy]}]
[2015-07-08 12:13:50,810][TRACE][discovery.zen.ping.multicast] [Jonathan "John" Garrett] [1] sending ping request
[2015-07-08 12:13:50,812][TRACE][discovery.zen.ping.unicast] [Jonathan "John" Garrett] [1] sending to [Jonathan "Jo
hn" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]
[2015-07-08 12:13:50,819][TRACE][discovery.zen.ping.unicast] [Jonathan "John" Garrett] [1] received response from [
Jonathan "John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]: [pi
ng_response{node [[Jonathan "John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.
180.235:9300]]], id[1], master [null], hasJoinedOnce [false], cluster_name[dummy]}, ping_response{node [[Jonathan "
John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]], id[3], maste
r [null], hasJoinedOnce [false], cluster_name[dummy]}, ping_response{node [[Jonathan "John" Garrett][jF69Ngu7Tyu4nY
Ccsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]], id[4], master [null], hasJoinedOnce [fals
e], cluster_name[dummy]}]
[2015-07-08 12:13:52,313][TRACE][discovery.zen.ping.multicast] [Jonathan "John" Garrett] [1] sending last pings
[2015-07-08 12:13:52,314][TRACE][discovery.zen.ping.multicast] [Jonathan "John" Garrett] [1] sending ping request
[2015-07-08 12:13:52,321][TRACE][discovery.zen.ping.unicast] [Jonathan "John" Garrett] [1] sending to [Jonathan "Jo
hn" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]
[2015-07-08 12:13:52,328][TRACE][discovery.zen.ping.unicast] [Jonathan "John" Garrett] [1] received response from [
Jonathan "John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]: [pi
ng_response{node [[Jonathan "John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.
180.235:9300]]], id[1], master [null], hasJoinedOnce [false], cluster_name[dummy]}, ping_response{node [[Jonathan "
John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]], id[3], maste
r [null], hasJoinedOnce [false], cluster_name[dummy]}, ping_response{node [[Jonathan "John" Garrett][jF69Ngu7Tyu4nY
Ccsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]]], id[5], master [null], hasJoinedOnce [fals
e], cluster_name[dummy]}, ping_response{node [[Jonathan "John" Garrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-
1000.internal][inet[/10.240.180.235:9300]]], id[6], master [null], hasJoinedOnce [false], cluster_name[dummy]}]
[2015-07-08 12:13:53,065][TRACE][discovery.gce            ] [Jonathan "John" Garrett] full ping responses: {none}
[2015-07-08 12:13:53,066][DEBUG][discovery.gce            ] [Jonathan "John" Garrett] filtered ping responses: (fil
ter_client[true], filter_data[false]) {none}
[2015-07-08 12:13:53,076][INFO ][cluster.service          ] [Jonathan "John" Garrett] new_master [Jonathan "John" G
arrett][jF69Ngu7Tyu4nYCcsec81g][es-node1.c.escloud-1000.internal][inet[/10.240.180.235:9300]], reason: zen-disco-jo
in (elected_as_master)
[2015-07-08 12:13:53,090][TRACE][discovery.gce            ] [Jonathan "John" Garrett] cluster joins counter set to 
[1] (elected as master)
[2015-07-08 12:13:53,225][INFO ][http                     ] [Jonathan "John" Garrett] bound_address {inet[/0.0.0.0:
9200]}, publish_address {inet[/10.240.180.235:9200]}
[2015-07-08 12:13:53,229][INFO ][node                     ] [Jonathan "John" Garrett] started
[2015-07-08 12:13:53,367][INFO ][gateway                  ] [Jonathan "John" Garrett] recovered [1] indices into cl
uster_state

Please let me know if I can provide any additional info.

Copied from original issue: elastic/elasticsearch-cloud-gce#54

@dadoonet
Copy link
Member Author

From @yaraju on July 8, 2015 12:34

Additional info:
I was running with two nodes:
es-node1
es-node2
Each of the Marvel indices expect to have 1 shard, 1 replica - but despite the 2nd machine showing up - each of them did not notice the other.

@dadoonet
Copy link
Member Author

From @yaraju on July 8, 2015 12:44

Also, logs from 2nd node:

[2015-07-08 12:14:14,507][INFO ][node                     ] [White Tiger] version[1.6.0], pid[9627], build[cdd3ac4/
2015-06-09T13:36:34Z]
[2015-07-08 12:14:14,508][INFO ][node                     ] [White Tiger] initializing ...
[2015-07-08 12:14:14,535][INFO ][plugins                  ] [White Tiger] loaded [marvel, cloud-gce], sites [marvel
, head]
[2015-07-08 12:14:14,585][INFO ][env                      ] [White Tiger] using [1] data paths, mounts [[/ (/dev/sd
a1)]], net usable_space [7.8gb], net total_space [9.8gb], types [ext4]
[2015-07-08 12:14:16,990][DEBUG][discovery.zen.elect      ] [White Tiger] using minimum_master_nodes [-1]
[2015-07-08 12:14:16,992][DEBUG][discovery.zen.ping.multicast] [White Tiger] using group [224.2.2.4], with port [54
328], ttl [3], and address [null]
[2015-07-08 12:14:16,995][DEBUG][discovery.zen.ping.unicast] [White Tiger] using initial hosts [], with concurrent_
connects [10]
[2015-07-08 12:14:16,996][DEBUG][discovery.gce            ] [White Tiger] using ping.timeout [3s], join.timeout [1m
], master_election.filter_client [true], master_election.filter_data [false]
[2015-07-08 12:14:16,998][DEBUG][discovery.zen.fd         ] [White Tiger] [master] uses ping_interval [1s], ping_ti
meout [30s], ping_retries [3]
[2015-07-08 12:14:17,000][DEBUG][discovery.zen.fd         ] [White Tiger] [node  ] uses ping_interval [1s], ping_ti
meout [30s], ping_retries [3]
[2015-07-08 12:14:17,832][INFO ][node                     ] [White Tiger] initialized
[2015-07-08 12:14:17,834][INFO ][node                     ] [White Tiger] starting ...
[2015-07-08 12:14:17,895][INFO ][transport                ] [White Tiger] bound_address {inet[/0.0.0.0:9300]}, publ
ish_address {inet[/10.240.170.154:9300]}
[2015-07-08 12:14:17,909][INFO ][discovery                ] [White Tiger] dummy/kJxYRSn_RtyEaWxBMZkvXQ
[2015-07-08 12:14:17,912][TRACE][discovery.gce            ] [White Tiger] starting to ping
[2015-07-08 12:14:17,921][TRACE][discovery.zen.ping.multicast] [White Tiger] [1] sending ping request
[2015-07-08 12:14:17,923][TRACE][discovery.zen.ping.unicast] [White Tiger] [1] connecting to [White Tiger][kJxYRSn_
RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]
[2015-07-08 12:14:17,949][TRACE][discovery.zen.ping.unicast] [White Tiger] [1] connected to [White Tiger][kJxYRSn_R
tyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]
[2015-07-08 12:14:17,949][TRACE][discovery.zen.ping.unicast] [White Tiger] [1] sending to [White Tiger][kJxYRSn_Rty
EaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]
[2015-07-08 12:14:17,973][TRACE][discovery.zen.ping.unicast] [White Tiger] [1] received response from [White Tiger]
[kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]: [ping_response{node [[White
 Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]], id[1], master [null
], hasJoinedOnce [false], cluster_name[dummy]}, ping_response{node [[White Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.
c.escloud-1000.internal][inet[/10.240.170.154:9300]]], id[2], master [null], hasJoinedOnce [false], cluster_name[du
mmy]}]
[2015-07-08 12:14:19,422][TRACE][discovery.zen.ping.multicast] [White Tiger] [1] sending ping request
[2015-07-08 12:14:19,424][TRACE][discovery.zen.ping.unicast] [White Tiger] [1] sending to [White Tiger][kJxYRSn_Rty
EaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]
[2015-07-08 12:14:19,426][TRACE][discovery.zen.ping.unicast] [White Tiger] [1] received response from [White Tiger]
[kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]: [ping_response{node [[White
 Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]], id[1], master [null
], hasJoinedOnce [false], cluster_name[dummy]}, ping_response{node [[White Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.
c.escloud-1000.internal][inet[/10.240.170.154:9300]]], id[3], master [null], hasJoinedOnce [false], cluster_name[du
mmy]}, ping_response{node [[White Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170
.154:9300]]], id[4], master [null], hasJoinedOnce [false], cluster_name[dummy]}]
[2015-07-08 12:14:20,925][TRACE][discovery.zen.ping.multicast] [White Tiger] [1] sending last pings
[2015-07-08 12:14:20,925][TRACE][discovery.zen.ping.multicast] [White Tiger] [1] sending ping request
[2015-07-08 12:14:20,927][TRACE][discovery.zen.ping.unicast] [White Tiger] [1] sending to [White Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]
[2015-07-08 12:14:20,929][TRACE][discovery.zen.ping.unicast] [White Tiger] [1] received response from [White Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]: [ping_response{node [[White Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]], id[1], master [null], hasJoinedOnce [false], cluster_name[dummy]}, ping_response{node [[White Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]], id[3], master [null], hasJoinedOnce [false], cluster_name[dummy]}, ping_response{node [[White Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]], id[5], master [null], hasJoinedOnce [false], cluster_name[dummy]}, ping_response{node [[White Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]]], id[6], master [null], hasJoinedOnce [false], cluster_name[dummy]}]
[2015-07-08 12:14:21,676][TRACE][discovery.gce            ] [White Tiger] full ping responses: {none}
[2015-07-08 12:14:21,677][DEBUG][discovery.gce            ] [White Tiger] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2015-07-08 12:14:21,685][INFO ][cluster.service          ] [White Tiger] new_master [White Tiger][kJxYRSn_RtyEaWxBMZkvXQ][es-node2.c.escloud-1000.internal][inet[/10.240.170.154:9300]], reason: zen-disco-join (elected_as_master)
[2015-07-08 12:14:21,695][TRACE][discovery.gce            ] [White Tiger] cluster joins counter set to [1] (elected as master)
[2015-07-08 12:14:21,805][INFO ][http                     ] [White Tiger] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/10.240.170.154:9200]}
[2015-07-08 12:14:21,805][INFO ][node                     ] [White Tiger] started
[2015-07-08 12:14:21,894][INFO ][gateway                  ] [White Tiger] recovered [1] indices into cluster_state

@dadoonet
Copy link
Member Author

From @yaraju on July 8, 2015 18:34

I rewrote my script to use ES 1.5.0 and plugin version 2.5.0, and that works fine.

So my script is fine, but fail with the new plugin.

I can share my Gcloud project with you if you'd like to take a closer look.

@dadoonet
Copy link
Member Author

From @akleiman on July 16, 2015 17:3

Seems like the same problem I had in #53

@dadoonet
Copy link
Member Author

From @schonfeld on July 28, 2015 5:27

This, too (see #53), is probably due to someone replacing some important code with a "TODO" comment...

https://github.com/elastic/elasticsearch-cloud-gce/blob/master/src/main/java/org/elasticsearch/discovery/gce/GceDiscovery.java#L49

@dadoonet
Copy link
Member Author

From @danielschonfeld on July 28, 2015 5:28

cc @schonfeld @dadoonet 👍

@dadoonet
Copy link
Member Author

dadoonet commented Aug 9, 2016

I just ran today some tests on 5.0.0-alpha5 and discovery is working well. I think we could close this.
I'm also supposing that everything works also well in 2.x series. AFAIR the tests I did at least.

@dadoonet dadoonet closed this as completed Aug 9, 2016
@beachmang
Copy link

@dadoonet I just upgraded some nodes from 1.5 -> 1.7.5 yesterday and I'm having the problem outlined in this thread. My (2) masters no longer see each other.

@dadoonet
Copy link
Member Author

@beachmang Can you share your logs? Can you run that with DEBUG level so we have more clues?

@beachmang
Copy link

@dadoonet My apologies, I read yesterday that the 2.5 version of the plugin was functioning with es 1.6 so I tried that. Happy to report it's also working with 1.7.5.

@dadoonet
Copy link
Member Author

Great. Thanks!

@clintongormley clintongormley added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Cloud GCE labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs help wanted adoptme
Projects
None yet
Development

No branches or pull requests

3 participants