Provide option to allow writes when master is down #60605

ywelsch · 2020-08-03T15:50:21Z

Elasticsearch currently blocks writes by default when a master is unavailable. The cluster.no_master_block setting allows a user to change this behavior to also block reads when a master is unavailable. This PR introduces a way to now also still allow writes when a master is offline. Writes will continue to work as long as routing table changes are not needed (as those require the master for consistency), or if dynamic mapping updates are not required (as again, these require the master for consistency).

Eventually we should switch the default of cluster.no_master_block to this new mode.

elasticmachine · 2020-08-03T15:50:23Z

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

…_writes

henningandersen

When indexing with the block in place, we would previously timeout the entire shard or bulk request after the timeout provided (defaulting to 1 minute).

With the new metadata_write block, the write will go through (which is fine), but in case of a shard failure, it will block the request indefinitely instead.

I think this has two potential bad effects:

We could build up lots of shard failed requests waiting for this.
When a master comes back, we could have a burst of those sent to master.

I guess the byte based limiting also puts a limit to 1 and the shard failed deduplication solves 2 so this is likely not an issue, but thought I would mention anyway in case it makes others worried.

Otherwise looking good to me.

server/src/internalClusterTest/java/org/elasticsearch/cluster/NoMasterNodeIT.java

…_writes

ywelsch · 2020-08-12T07:15:40Z

As you pointed out, the previous behavior was to unconditionally time out these write requests in the Reroute stage after a minute. The new behavior will proceed in the reroute phase, but keep the requests in a "stuck" state until a master is back. As a lot of requests can be piling up on a node within a minute (more than the node has memory), I think this should not introduce new unseen behavior. The byte-based memory limit for indexing is of help not only with this new block, but also with the old blocks. With the write block active (i.e. the current default), many requests can start piling up, with no bound at all (each one is turned into a ClusterStateObserver, waiting up to a minute for cluster state updates).

henningandersen

LGTM.

Elasticsearch currently blocks writes by default when a master is unavailable. The cluster.no_master_block setting allows a user to change this behavior to also block reads when a master is unavailable. This PR introduces a way to now also still allow writes when a master is offline. Writes will continue to work as long as routing table changes are not needed (as those require the master for consistency), or if dynamic mapping updates are not required (as again, these require the master for consistency). Eventually we should switch the default of cluster.no_master_block to this new mode.

We can't assert on the specific exception, unfortunately.

Provide option to allow writes when master is down

060c379

ywelsch added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.10.0 labels Aug 3, 2020

elasticmachine added the Team:Distributed Meta label for distributed team (obsolete) label Aug 3, 2020

ywelsch added 2 commits August 3, 2020 18:11

quality

75a3bb0

search throws all kinds of exceptions

f88f832

ywelsch requested a review from DaveCTurner August 6, 2020 08:46

ywelsch added 4 commits August 10, 2020 13:12

Randomization

1caccd6

Merge remote-tracking branch 'elastic/master' into no_master_metadata…

ed66cf8

…_writes

random != random

0703a26

Merge remote-tracking branch 'elastic/master' into no_master_metadata…

97b9b68

…_writes

ywelsch requested a review from henningandersen August 11, 2020 08:09

henningandersen reviewed Aug 11, 2020

View reviewed changes

server/src/internalClusterTest/java/org/elasticsearch/cluster/NoMasterNodeIT.java Show resolved Hide resolved

ywelsch added 2 commits August 12, 2020 09:14

extend test with dynamic index creation and dynamic mappings

d9d2e10

Merge remote-tracking branch 'elastic/master' into no_master_metadata…

51f5b53

…_writes

ywelsch requested a review from henningandersen August 12, 2020 07:15

henningandersen approved these changes Aug 12, 2020

View reviewed changes

ywelsch merged commit 0b517dd into elastic:master Aug 12, 2020

ywelsch added a commit that referenced this pull request Aug 13, 2020

Fix testNoMasterActionsMetadataWriteMasterBlock (#60605)

504678a

We can't assert on the specific exception, unfortunately.

ywelsch added a commit that referenced this pull request Aug 13, 2020

Fix testNoMasterActionsMetadataWriteMasterBlock (#60605)

8e77539

We can't assert on the specific exception, unfortunately.

Mpdreamz mentioned this pull request Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

stevejgordon mentioned this pull request Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide option to allow writes when master is down #60605

Provide option to allow writes when master is down #60605

ywelsch commented Aug 3, 2020 •

edited

Loading

elasticmachine commented Aug 3, 2020

henningandersen left a comment

ywelsch commented Aug 12, 2020

henningandersen left a comment

Provide option to allow writes when master is down #60605

Provide option to allow writes when master is down #60605

Conversation

ywelsch commented Aug 3, 2020 • edited Loading

elasticmachine commented Aug 3, 2020

henningandersen left a comment

Choose a reason for hiding this comment

ywelsch commented Aug 12, 2020

henningandersen left a comment

Choose a reason for hiding this comment

ywelsch commented Aug 3, 2020 •

edited

Loading