Hard limit on total number of shards in a cluster #20705

alexbrasetvik · 2016-09-30T09:25:18Z

We're seeing hundreds of cases of too many shards causing problems vs. problems caused by having too few.

It would be great to have a default hard limit, even though it can be increased (through the cluster settings API). It'll raise awareness to this issue hopefully in a "I can bump this now, but need to fix it"-way.

javanna · 2016-09-30T09:35:25Z

@alexbrasetvik I think this was done yesterday on #20682 ;)

jasontedor · 2016-09-30T09:48:37Z

@javanna That's different, that's per index but the request here is per cluster.

alexbrasetvik · 2016-09-30T09:50:48Z

Just synced up with @s1monw who asked me to create this issue while we talked about the per-index limit. This one is indeed per cluster, as a total number of shards - whether it's a few indices with a lot of shards, or many single-shard indices.

javanna · 2016-09-30T09:54:03Z

sounds good thanks for clarifying.

clintongormley · 2016-10-07T10:46:06Z

max_shards_per_node

This setting would be checked on user actions like create index, restore snapshot, open index. If the total number of shards in the cluster is greater than max_shards_per_node * number_of_nodes then the user action can be rejected. This implementation allows the max value to be exceeded if (eg) a node fails, resulting in a lower total max shards per cluster.

We would default to a high number during 5.x (eg 1000), giving sysadmins the ability to set it to whatever makes sense for their cluster, and we can look at lowering this value for 6.0.

gmoskovicz · 2017-04-24T12:18:19Z

We would default to a high number during 5.x (eg 1000), giving sysadmins the ability to set it to whatever makes sense for their cluster, and we can look at lowering this value for 6.0.

I would say that ~500/600 shards per node is a good limit.

jasontedor · 2017-05-12T14:48:41Z

@s1monw Raising this one to you.

cdahlqvist · 2017-06-23T15:49:08Z

Should the limit of shards per node not be linked to the amount of heap space a node has, e.g. 20 shard limit per GB of heap a node has allocated?

gmoskovicz · 2017-06-23T15:51:39Z

@cdahlqvist i like that idea!

ron-totango · 2018-01-16T15:39:39Z

Could someone explain the motivation for the shard limit per node? Is it related to the node type - the amount of memory it has? disk space? Anything else?
We have 40K shards (using per day indexes) and we're hitting issues of Large cluster states that we don't know how to resolve...

DaveCTurner · 2018-01-17T11:53:43Z

@ron-totango that's a question that's better suited to the support forums over at https://discuss.elastic.co - 40k shards sounds like too many, and the forums should be able to help you reduce it to something more reasonable.

ron-totango · 2018-01-17T12:50:28Z

Thanks @DaveCTurner . Already tried to ask at https://discuss.elastic.co/t/configuring-a-cluster-for-a-large-number-of-indexes/115731 but didn't get any meaningful reply :-(

majormoses · 2018-02-06T23:43:45Z

When we are talking about the limit of shards per node (averaged through the cluster) are we considering primary or do replica shards count as well?

elasticmachine · 2018-03-27T11:44:40Z

Pinging @elastic/es-distributed

DaveCTurner · 2018-04-30T08:34:53Z

This is currently labelled :Distributed/Allocation but I think it's not a great idea to solve this in the allocator by refusing to allocate more than a certain number of shards per node. It seems like a better idea to check this on actions that create the shards-to-be-allocated:

This setting would be checked on user actions like create index, restore snapshot, open index.

I think, given the above comment, that this'd be better labelled :Core/Index APIs, so I'm doing so.

elasticmachine · 2018-04-30T08:35:02Z

Pinging @elastic/es-core-infra

dakrone · 2018-07-25T14:22:14Z

We discussed this during the core/infra sync, we agreed that a limit is good, and that doing it at the validation layer is a good idea (rather than doing it at the allocation decider level). We agreed on Clint's proposal of making the limit a factor of the number of nodes. Marking this as adoptme and removing the discussion label now.

cdahlqvist · 2018-07-26T04:41:29Z

@dakrone What is the reason this will be based on the number of nodes rather than the available heap size? I would expect a 3-node 2GB Elastic Cloud cluster to need a much lower limit than a 3-node 64GB Elastic Cloud cluster.

jasontedor · 2018-07-27T02:29:59Z

@cdahlqvist That's a concern about what the default per node should be, not whether or not it should be based on the number of nodes. We will likely start simple with a blanket per node default and can consider over time making the default ergonomic to the heap size.

cdekker · 2018-07-27T16:13:11Z

Will this include the number of replicas?

Adds a safety limit on the number of shards in a cluster, based on the number of nodes in the cluster. The limit is checked on operations that add (or activate) shards, such as index creation, snapshot restoration, and opening closed indices, and can be changed via the cluster settings API. Closes elastic#20705

gwbrown · 2018-10-24T02:46:26Z

@cdekker The implementation merged in #34021 counts replicas towards the limit, as replicas consume resources in much the same way as primary shards.

vigyasharma · 2019-02-15T07:53:14Z

An overall high shard count in cluster also loads up master node operations. Are there plans for a high overall limit for cluster irrespective of number of nodes? Or limit on number of nodes in the cluster?

Bukhtawar · 2019-04-06T12:26:06Z

Should master log warning/prevent index creation or add mappings if the heap on master is too low to support the cluster limit which IMO should also factor in heap on the data node as pointed out by @cdahlqvist

alexbrasetvik added the I hate to say I told you so label Sep 30, 2016

clintongormley added >enhancement help wanted adoptme :Cluster and removed I hate to say I told you so labels Oct 7, 2016

This was referenced Oct 7, 2016

Limit index creation rate #20760

Closed

Add safeguards to prevent simple user errors #11511

Closed

clintongormley mentioned this issue Nov 20, 2016

index creation limitation #21684

Closed

clintongormley mentioned this issue Apr 24, 2017

Soft Limit for Max number of shards per node #24253

Closed

majormoses mentioned this issue Jan 26, 2018

New check for number of shards for the whole cluster sensu-plugins/sensu-plugins-elasticsearch#107

Open

clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018

bleskes added the good first issue low hanging fruit label Mar 27, 2018

ywelsch added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Mar 27, 2018

ywelsch removed the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. label Mar 27, 2018

elasticmachine mentioned this issue Apr 25, 2018

[Monitoring] Shard count cluster alert #29958

Closed

DaveCTurner added the :Data Management/Indices APIs APIs to create and manage indices and templates label Apr 30, 2018

DaveCTurner removed the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Apr 30, 2018

dakrone added team-discuss and removed help wanted adoptme labels Jul 11, 2018

dakrone added help wanted adoptme and removed team-discuss labels Jul 25, 2018

gwbrown self-assigned this Aug 7, 2018

jasontedor removed the help wanted adoptme label Aug 8, 2018

gwbrown mentioned this issue Aug 14, 2018

Add cluster-wide shard limit #32856

Closed

gwbrown mentioned this issue Sep 25, 2018

Add cluster-wide shard limit warnings #34021

Merged

gwbrown mentioned this issue Oct 25, 2018

Always enforce cluster-wide shard limit #34892

Merged

gwbrown closed this as completed in #34892 Nov 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hard limit on total number of shards in a cluster #20705

Hard limit on total number of shards in a cluster #20705

alexbrasetvik commented Sep 30, 2016

javanna commented Sep 30, 2016

jasontedor commented Sep 30, 2016

alexbrasetvik commented Sep 30, 2016

javanna commented Sep 30, 2016

clintongormley commented Oct 7, 2016

gmoskovicz commented Apr 24, 2017

jasontedor commented May 12, 2017

cdahlqvist commented Jun 23, 2017 •

edited

Loading

gmoskovicz commented Jun 23, 2017

ron-totango commented Jan 16, 2018

DaveCTurner commented Jan 17, 2018

ron-totango commented Jan 17, 2018

majormoses commented Feb 6, 2018

elasticmachine commented Mar 27, 2018

DaveCTurner commented Apr 30, 2018

elasticmachine commented Apr 30, 2018

dakrone commented Jul 25, 2018

cdahlqvist commented Jul 26, 2018

jasontedor commented Jul 27, 2018

cdekker commented Jul 27, 2018

gwbrown commented Oct 24, 2018

vigyasharma commented Feb 15, 2019

Bukhtawar commented Apr 6, 2019 •

edited

Loading

Hard limit on total number of shards in a cluster #20705

Hard limit on total number of shards in a cluster #20705

Comments

alexbrasetvik commented Sep 30, 2016

javanna commented Sep 30, 2016

jasontedor commented Sep 30, 2016

alexbrasetvik commented Sep 30, 2016

javanna commented Sep 30, 2016

clintongormley commented Oct 7, 2016

max_shards_per_node

gmoskovicz commented Apr 24, 2017

jasontedor commented May 12, 2017

cdahlqvist commented Jun 23, 2017 • edited Loading

gmoskovicz commented Jun 23, 2017

ron-totango commented Jan 16, 2018

DaveCTurner commented Jan 17, 2018

ron-totango commented Jan 17, 2018

majormoses commented Feb 6, 2018

elasticmachine commented Mar 27, 2018

DaveCTurner commented Apr 30, 2018

elasticmachine commented Apr 30, 2018

dakrone commented Jul 25, 2018

cdahlqvist commented Jul 26, 2018

jasontedor commented Jul 27, 2018

cdekker commented Jul 27, 2018

gwbrown commented Oct 24, 2018

vigyasharma commented Feb 15, 2019

Bukhtawar commented Apr 6, 2019 • edited Loading

cdahlqvist commented Jun 23, 2017 •

edited

Loading

Bukhtawar commented Apr 6, 2019 •

edited

Loading