Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

somtimes data node can not find master. #13530

Closed
darjeelings opened this issue Sep 12, 2015 · 3 comments
Closed

somtimes data node can not find master. #13530

darjeelings opened this issue Sep 12, 2015 · 3 comments

Comments

@darjeelings
Copy link

My cluster has 9 data nodes / 2 master / 1 search / 112 node client.

Sometimes data node can't find master node.

There was no network, cpu load, memory problem.
No status change log.

I'm using this on AWS EC2. Therefore I am using elasticsearch-cloud-aws plugin(https://github.com/elastic/elasticsearch-cloud-aws#generic-configuration).

I set up the ping_timeout 60s.

Is it normal?

Master Log

[2015-09-11 06:13:31,348][WARN ][discovery.zen.publish ] [es-master-1] timed out waiting for all nodes to process published state [101517](timeout [30s], pending nodes: [[es-data-03][CY6HkL
lXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false}])
[2015-09-11 06:14:17,137][WARN ][discovery.zen.publish ] [es-master-1] timed out waiting for all nodes to process published state [101518](timeout [30s], pending nodes: [[es-data-03][CY6HkL
lXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false}]) (~46s)
[2015-09-11 06:14:50,083][WARN ][discovery.zen.publish ] [es-master-1] timed out waiting for all nodes to process published state [101519](timeout [30s], pending nodes: [[es-data-03][CY6HkL
lXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false}]) (~33s)
[2015-09-11 06:15:20,093][WARN ][discovery.zen.publish ] [es-master-1] timed out waiting for all nodes to process published state [101520](timeout [30s], pending nodes: [[es-data-03][CY6HkL
lXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false}]) (~40s)
[2015-09-11 06:19:32,729][WARN ][discovery.zen.publish ] [es-master-1] timed out waiting for all nodes to process published state [101521](timeout [30s], pending nodes: [[es-data-03][CY6HkL
lXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false}]) (~4m 12s)
[2015-09-11 06:20:02,734][WARN ][discovery.zen.publish ] [es-master-1] timed out waiting for all nodes to process published state [101522](timeout [30s], pending nodes: [[es-data-03][CY6HkL
lXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false}]) (~30s)
[2015-09-11 06:21:17,182][WARN ][discovery.zen.publish ] [es-master-1] timed out waiting for all nodes to process published state [101523](timeout [30s], pending nodes: [[es-data-03][CY6HkL
lXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false}]) (~1m 15s)
[2015-09-11 06:21:47,192][WARN ][discovery.zen.publish ] [es-master-1] timed out waiting for all nodes to process published state [101524](timeout [30s], pending nodes: [[es-data-03][CY6HkL
lXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false}]) (~30s)
[2015-09-11 06:22:17,203][WARN ][discovery.zen.publish ] [es-master-1] timed out waiting for all nodes to process published state [101525](timeout [30s], pending nodes: [[es-data-03][CY6HkL
lXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false}]) (~30s)
[2015-09-11 06:22:47,213][WARN ][discovery.zen.publish ] [es-master-1] timed out waiting for all nodes to process published state [101526](timeout [30s], pending nodes: [[es-data-03][CY6HkL
lXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false}]) (~30s)
[2015-09-11 06:28:32,613][INFO ][cluster.service ] [es-master-1] removed {[es-data-03][CY6HkLlXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_stora
ge_nodes=1, master=false},}, reason: zen-disco-node_failed([es-data-03][CY6HkLlXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, master=false
}), reason transport disconnected
[2015-09-11 06:29:38,181][INFO ][cluster.service ] [es-master-1] added {[es-data-03][CY6HkLlXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage
_nodes=1, master=false},}, reason: zen-disco-receive(join from node[[es-data-03][CY6HkLlXSwmqH6BqeTRvcQ][ip-x-x-x-x][inet[/xx.xx.xx.xx:9300]]{rack=rack_tone, max_local_storage_nodes=1, mas
ter=false}])

DATA LOG

[2015-09-11 06:28:30,715][WARN ][discovery.ec2 ] [es-data-03] master left (reason = do not exists on master, act as master failure), current nodes: {[es-xxxxx-xxx-xxx-xxx-xxxxx-3-3
][_oe3CD_LQZuouoTmPk-tCg][xxxxxxx][inet[/xx.xx.xx.xx:9306]]{client=true, data=false},[es-xxxxx-xxx-xxx-xxx-xxxxx-2-2][ENE4uw6zQUedCo-lqrmSkw][xxxxxxxxxx][inet[/xx.xx.xx.xx:9304]]{clie
nt=true, data=false},[es-xxxxx-xxx-xxx-dcl-xxxxx-2-2][wITo5HsPQdWFEnapurGlyQ][xxxxxxxxxx][inet[/xx.xx.xx.xx:9307]]{client=true, data=false},[es-xxxxx-xxx-xxx-xxxx-xxxxx-2-2][g5bwjPizSia

...

[2015-09-11 06:28:30,716][INFO ][cluster.service ] [es-data-03] removed {[es-master-1][qtVZhNb2T2uuyuEHgIPBmQ][ip-xx-xx-xx-xx][inet[/xx.xx.xx.xx:9300]]{data=false, rack=rack_tone, max
_local_storage_nodes=1, master=true},}, reason: zen-disco-master_failed ([es-master-1][qtVZhNb2T2uuyuEHgIPBmQ][ip-xx-xx-xx-xx][inet[/xx.xx.xx.xx:9300]]{data=false, rack=rack_tone, max_local_st
orage_nodes=1, master=true})
[2015-09-11 06:29:31,881][INFO ][cluster.service ] [es-data-03] detected_master [es-master-1][qtVZhNb2T2uuyuEHgIPBmQ][ip-xx-xx-xx-xx][inet[/xx.xx.xx.xx:9300]]{data=false, rack=rack_to
ne, max_local_storage_nodes=1, master=true}, added {[es-master-1][qtVZhNb2T2uuyuEHgIPBmQ][ip-xx-xx-xx-xx][inet[/xx.xx.xx.xx:9300]]{data=false, rack=rack_tone, max_local_storage_nodes=1, master
=true},}, reason: zen-disco-receive(from master [[es-master-1][qtVZhNb2T2uuyuEHgIPBmQ][ip-xx-xx-xx-xx][inet[/xx.xx.xx.xx:9300]]{data=false, rack=rack_tone, max_local_storage_nodes=1, master=tr
ue}])

@markwalkom
Copy link
Contributor

Please join us in #elasticsearch on Freenode or at https://discuss.elastic.co/ for troubleshooting help or general questions.

We reserve Github for confirmed bugs and feature requests :)

@hmorgado
Copy link

hmorgado commented Dec 9, 2015

Same problem here...
Try setting the "discovery.zen.minimum_master_nodes" setting.
From the docs:
"The minimum_master_nodes setting is extremely important to the stability of your cluster. This setting helps prevent split brains, the existence of two masters in a single cluster.
This setting should always be configured to a quorum (majority) of your master-eligible nodes. A quorum is (number of master-eligible nodes / 2) + 1"

https://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html#_minimum_master_nodes

@markwalkom
Copy link
Contributor

@hmorgado please use the forums

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants