-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cluster-wide shard limit #32856
Add cluster-wide shard limit #32856
Conversation
Adds a safety limit on the number of shards in a cluster, based on the number of nodes in the cluster. The limit is checked on operations that add (or activate) shards, such as index creation, snapshot restoration, and opening closed indices, and can be changed via the cluster settings API. Closes elastic#20705
Based on review feedback. Either can be used to set the per-node shard limit, so let's verify both.
During cluster startup, a cluster may consist only of master, non-data nodes. In this case, we want to allow the user to configure the cluster until the data nodes come online.
@@ -127,6 +127,9 @@ | |||
|
|||
} | |||
|
|||
public static final Setting<Integer> SETTING_CLUSTER_MAX_SHARDS_PER_NODE = | |||
Setting.intSetting("cluster.shards.max_per_node", 1000, Property.Dynamic, Property.NodeScope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set a minimum for this setting of 1 shard per node? (so that people don't set it to -171 and expect weird things)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a higher minimum is warranted -- e.g., to ensure if we are setting up a new cluster we can create a .kibana index and so on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default here is a 1000 so we will be fine out of the box. The question here is the minimum and there is not a good value as there are many indices that might be created (.kibana, .security, Watcher, .monitoring, etc.). It is too hard to find the right minimum to ensure the basics of our stack function and to keep this value properly maintained. If someone really does want to set the value to one shard per node, I think we should permit that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a minimum of 1 - I agree with Jason, I think trying to figure out any other minimum would be very complicated.
Based on review comments in elastic#32856
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start. I think the implementation here misses a critical case which is updating index settings to increase the number of replicas. For example, I think the following would be permitted with the current implementation:
- set the limit to 1 shard per node
- start two nodes
- create an index
i
and an indexj
withindex.number_of_replicas
set to zero, and the default number of shards - now, creating a third index will be blocked by the max shards per node limit 🎉
- however, a settings update on
i
andj
to increase theindex.number_of_replicas
to one would be permitted, yet this would put the cluster over the limit 😢
Per discussion on elastic#32856, the cluster-wide shard limit is now enforced when changing the number of replicas used by an index.
Jason makes an excellent point - I simply forgot about that case. I've added code to handle changing the replica settings, as well as several test cases. Additionally, following the rule of three, I've factored some shared logic out into a shared method. |
It appears that ActionRequestValidationException tends to be used for more client-related purposes, and ValidationException is more appropriate here.
@elasticmachine retest this please |
This might already be planned but I think we might want to add some kind of deprecation warning to 6.x to explain to the user that this breaking change is coming in 7.0 if they are over |
The subclasses have been removed, this cleans up a couple remaining instances of the subclasses in the tests.
Per discussion with @jasontedor, I'm closing this PR, pending a new PR which implements deprecation warnings for clusters with shard counts above the default limit. This will be followed shortly by opt-in enforcement (to be backported to 6.x) and enforcement by default in 7.0+. |
Adds a safety limit on the number of shards in a cluster, based on
the number of nodes in the cluster. The limit is checked on operations
that add (or activate) shards, such as index creation, snapshot
restoration, and opening closed indices, and can be changed via the
cluster settings API.
Closes #20705