Add cluster-wide shard limit #32856

gwbrown · 2018-08-14T18:44:52Z

Adds a safety limit on the number of shards in a cluster, based on
the number of nodes in the cluster. The limit is checked on operations
that add (or activate) shards, such as index creation, snapshot
restoration, and opening closed indices, and can be changed via the
cluster settings API.

Closes #20705

Adds a safety limit on the number of shards in a cluster, based on the number of nodes in the cluster. The limit is checked on operations that add (or activate) shards, such as index creation, snapshot restoration, and opening closed indices, and can be changed via the cluster settings API. Closes elastic#20705

Based on review feedback. Either can be used to set the per-node shard limit, so let's verify both.

During cluster startup, a cluster may consist only of master, non-data nodes. In this case, we want to allow the user to configure the cluster until the data nodes come online.

dakrone · 2018-08-14T19:16:02Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaData.java

@@ -127,6 +127,9 @@

    }

+    public static final Setting<Integer> SETTING_CLUSTER_MAX_SHARDS_PER_NODE =
+        Setting.intSetting("cluster.shards.max_per_node", 1000, Property.Dynamic, Property.NodeScope);


Can we set a minimum for this setting of 1 shard per node? (so that people don't set it to -171 and expect weird things)

I wonder if a higher minimum is warranted -- e.g., to ensure if we are setting up a new cluster we can create a .kibana index and so on?

The default here is a 1000 so we will be fine out of the box. The question here is the minimum and there is not a good value as there are many indices that might be created (.kibana, .security, Watcher, .monitoring, etc.). It is too hard to find the right minimum to ensure the basics of our stack function and to keep this value properly maintained. If someone really does want to set the value to one shard per node, I think we should permit that.

Added a minimum of 1 - I agree with Jason, I think trying to figure out any other minimum would be very complicated.

Based on review comments in elastic#32856

jasontedor

This looks like a good start. I think the implementation here misses a critical case which is updating index settings to increase the number of replicas. For example, I think the following would be permitted with the current implementation:

set the limit to 1 shard per node
start two nodes
create an index i and an index j with index.number_of_replicas set to zero, and the default number of shards
now, creating a third index will be blocked by the max shards per node limit 🎉
however, a settings update on i and j to increase the index.number_of_replicas to one would be permitted, yet this would put the cluster over the limit 😢

Per discussion on elastic#32856, the cluster-wide shard limit is now enforced when changing the number of replicas used by an index.

gwbrown · 2018-08-15T21:36:22Z

Jason makes an excellent point - I simply forgot about that case. I've added code to handle changing the replica settings, as well as several test cases.

Additionally, following the rule of three, I've factored some shared logic out into a shared method.

It appears that ActionRequestValidationException tends to be used for more client-related purposes, and ValidationException is more appropriate here.

gwbrown · 2018-08-16T18:26:29Z

@elasticmachine retest this please

colings86 · 2018-08-17T14:09:19Z

This might already be planned but I think we might want to add some kind of deprecation warning to 6.x to explain to the user that this breaking change is coming in 7.0 if they are over 1,000 * num_nodes shards

The subclasses have been removed, this cleans up a couple remaining instances of the subclasses in the tests.

gwbrown · 2018-08-28T20:15:34Z

Per discussion with @jasontedor, I'm closing this PR, pending a new PR which implements deprecation warnings for clusters with shard counts above the default limit. This will be followed shortly by opt-in enforcement (to be backported to 6.x) and enforcement by default in 7.0+.

gwbrown added 4 commits August 14, 2018 12:43

[DOCS] Add documentation for cluster shard limit

a19e011

Randomize using persistent/transient settings

a3bb3b8

Based on review feedback. Either can be used to set the per-node shard limit, so let's verify both.

Only enforce shard limit if there are data nodes

a1015ea

During cluster startup, a cluster may consist only of master, non-data nodes. In this case, we want to allow the user to configure the cluster until the data nodes come online.

gwbrown requested review from jasontedor and mayya-sharipova August 14, 2018 18:44

dakrone reviewed Aug 14, 2018

View reviewed changes

Add minimum for per-node shard limit

f72cf0e

Based on review comments in elastic#32856

jasontedor requested changes Aug 14, 2018

View reviewed changes

Enforce shard limit on adding replicas to indexes

c453adf

Per discussion on elastic#32856, the cluster-wide shard limit is now enforced when changing the number of replicas used by an index.

colings86 added >enhancement :Data Management/Indices APIs APIs to create and manage indices and templates v7.0.0 labels Aug 16, 2018

Use more correct exception type

e997b67

It appears that ActionRequestValidationException tends to be used for more client-related purposes, and ValidationException is more appropriate here.

Merge branch 'master' into cluster-shard-limit

e25559e

gwbrown added 4 commits August 17, 2018 07:38

Use AcknowledgedResponse instead of subclasses

46836c8

The subclasses have been removed, this cleans up a couple remaining instances of the subclasses in the tests.

Merge branch 'master' into cluster-shard-limit

9147579

Merge branch 'master' into cluster-shard-limit

f87dc5e

Merge branch 'master' into cluster-shard-limit

9c91440

gwbrown closed this Aug 28, 2018

gwbrown mentioned this pull request Sep 25, 2018

Add cluster-wide shard limit warnings #34021

Merged

tlrx mentioned this pull request Nov 27, 2018

[Docs] Add a warning to the default max number of shards allowed in a cluster #35943

Closed

gwbrown deleted the cluster-shard-limit branch December 7, 2018 04:57

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cluster-wide shard limit #32856

Add cluster-wide shard limit #32856

gwbrown commented Aug 14, 2018

dakrone Aug 14, 2018

tomcallahan Aug 14, 2018

jasontedor Aug 14, 2018 •

edited

Loading

gwbrown Aug 14, 2018

jasontedor left a comment

gwbrown commented Aug 15, 2018 •

edited

Loading

gwbrown commented Aug 16, 2018

colings86 commented Aug 17, 2018

gwbrown commented Aug 28, 2018

Add cluster-wide shard limit #32856

Add cluster-wide shard limit #32856

Conversation

gwbrown commented Aug 14, 2018

dakrone Aug 14, 2018

Choose a reason for hiding this comment

tomcallahan Aug 14, 2018

Choose a reason for hiding this comment

jasontedor Aug 14, 2018 • edited Loading

Choose a reason for hiding this comment

gwbrown Aug 14, 2018

Choose a reason for hiding this comment

jasontedor left a comment

Choose a reason for hiding this comment

gwbrown commented Aug 15, 2018 • edited Loading

gwbrown commented Aug 16, 2018

colings86 commented Aug 17, 2018

gwbrown commented Aug 28, 2018

jasontedor Aug 14, 2018 •

edited

Loading

gwbrown commented Aug 15, 2018 •

edited

Loading