-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hard limit on total number of shards in a cluster #20705
Comments
@alexbrasetvik I think this was done yesterday on #20682 ;) |
@javanna That's different, that's per index but the request here is per cluster. |
Just synced up with @s1monw who asked me to create this issue while we talked about the per-index limit. This one is indeed per cluster, as a total number of shards - whether it's a few indices with a lot of shards, or many single-shard indices. |
sounds good thanks for clarifying. |
max_shards_per_nodeThis setting would be checked on user actions like create index, restore snapshot, open index. If the total number of shards in the cluster is greater than We would default to a high number during 5.x (eg 1000), giving sysadmins the ability to set it to whatever makes sense for their cluster, and we can look at lowering this value for 6.0. |
I would say that ~500/600 shards per node is a good limit. |
@s1monw Raising this one to you. |
Should the limit of shards per node not be linked to the amount of heap space a node has, e.g. 20 shard limit per GB of heap a node has allocated? |
@cdahlqvist i like that idea! |
Could someone explain the motivation for the shard limit per node? Is it related to the node type - the amount of memory it has? disk space? Anything else? |
@ron-totango that's a question that's better suited to the support forums over at https://discuss.elastic.co - 40k shards sounds like too many, and the forums should be able to help you reduce it to something more reasonable. |
Thanks @DaveCTurner . Already tried to ask at https://discuss.elastic.co/t/configuring-a-cluster-for-a-large-number-of-indexes/115731 but didn't get any meaningful reply :-( |
When we are talking about the limit of shards per node (averaged through the cluster) are we considering primary or do replica shards count as well? |
Pinging @elastic/es-distributed |
This is currently labelled
I think, given the above comment, that this'd be better labelled |
Pinging @elastic/es-core-infra |
We discussed this during the core/infra sync, we agreed that a limit is good, and that doing it at the validation layer is a good idea (rather than doing it at the allocation decider level). We agreed on Clint's proposal of making the limit a factor of the number of nodes. Marking this as adoptme and removing the discussion label now. |
@dakrone What is the reason this will be based on the number of nodes rather than the available heap size? I would expect a 3-node 2GB Elastic Cloud cluster to need a much lower limit than a 3-node 64GB Elastic Cloud cluster. |
@cdahlqvist That's a concern about what the default per node should be, not whether or not it should be based on the number of nodes. We will likely start simple with a blanket per node default and can consider over time making the default ergonomic to the heap size. |
Will this include the number of replicas? |
Adds a safety limit on the number of shards in a cluster, based on the number of nodes in the cluster. The limit is checked on operations that add (or activate) shards, such as index creation, snapshot restoration, and opening closed indices, and can be changed via the cluster settings API. Closes elastic#20705
An overall high shard count in cluster also loads up master node operations. Are there plans for a high overall limit for cluster irrespective of number of nodes? Or limit on number of nodes in the cluster? |
Should master log warning/prevent index creation or add mappings if the heap on master is too low to support the cluster limit which IMO should also factor in heap on the data node as pointed out by @cdahlqvist |
We're seeing hundreds of cases of too many shards causing problems vs. problems caused by having too few.
It would be great to have a default hard limit, even though it can be increased (through the cluster settings API). It'll raise awareness to this issue hopefully in a "I can bump this now, but need to fix it"-way.
The text was updated successfully, but these errors were encountered: