Faster access to started shards count of the index in each node #53559

kkewwei · 2020-03-13T19:08:52Z

ES_VERSION: 7.6.0
JVM version : JDK1.8.0_112
OS version : linux
Description of the problem including expected versus actual behavior:
As it's known, Updating ClusterState on master may cost too much time, which is not good for cluster. During the updating ClusterState, ShardsLimitAllocationDecider deciders iterate through all the shards on a node to find STARTED ones belonging to the index when cluster.routing.allocation.total_shards_per_node > 0, Which will cost too much time.

In out product, There are 39 nodes and 2,000 indices, 50,000 shards, but the time to update cluster state reach at 3.4min, It's intolerable.

To find out why it cost so much time on updating cluste state, I get the thread stack about updateTask, such that:

"[node-1][clusterService#updateTask][T#1]" #21 daemon prio=5 os_prio=0 tid=0x00007fc5c88fa800 nid=0x3369 runnable [0x00007fc58431a000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
        at org.elasticsearch.cluster.routing.allocation.decider.ShardsLimitAllocationDecider.doDecide(ShardsLimitAllocationDecider.java:112)
        at org.elasticsearch.cluster.routing.allocation.decider.ShardsLimitAllocationDecider.canAllocate(ShardsLimitAllocationDecider.java:88)
        at org.elasticsearch.cluster.routing.allocation.decider.AllocationDeciders.canAllocate(AllocationDeciders.java:73)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.decideMove(BalancedShardsAllocator.java:707)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.moveShards(BalancedShardsAllocator.java:648)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:123)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:329)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.applyStartedShards(AllocationService.java:100)
        at org.elasticsearch.cluster.action.shard.ShardStateAction$ShardStartedClusterStateTaskExecutor.execute(ShardStateAction.java:438)
        at org.elasticsearch.cluster.service.ClusterService.executeTasks(ClusterService.java:634)
        at org.elasticsearch.cluster.service.ClusterService.calculateTaskOutputs(ClusterService.java:612)

I try several times and get the same thread stack, it seems that ShardsLimitAllocationDecider.doDecide will cost too much time, the related code:

       if (indexShardLimit <= 0 && clusterShardLimit <= 0) {
            return allocation.decision(Decision.YES, NAME, "total shard limits are disabled: [index: %d, cluster: %d] <= 0",
                    indexShardLimit, clusterShardLimit);
        }
        int indexShardCount = 0;
        int nodeShardCount = 0;
        for (ShardRouting nodeShard : node) {
            // don't count relocating shards...
            if (nodeShard.relocating()) {
                continue;
            }
            nodeShardCount++;
            if (nodeShard.index().equals(shardRouting.index())) {
                indexShardCount++;
            }
        }

It will iterate 50000*50000/39 = 64,000,000 times, which will cost too much time.

There is room for optimization to avoid iterating the node:
1.If indexShardLimit=-1 and clusterShardLimit>0, we need't to count indexShardCount and nodeShardCount by iterating, nodeShardCount = node.size() - node.numberOfShardsWithState(ShardRoutingState.RELOCATING), indexShardCount is useless.
2. If we could count the started shards of each index in each node in RoutingNode to avoid the iteration?

The text was updated successfully, but these errors were encountered:

LoadingZhang · 2020-03-14T01:42:02Z

We met same problem here.
8000 indices and 50000 shards in cluster, and we set indexShardLimit every index to manually balance shards, this causes updating cluster state costs tens of seconds.

elasticmachine · 2020-03-18T14:20:52Z

Pinging @elastic/es-distributed (:Distributed/Allocation)

nik9000 · 2020-03-18T14:21:30Z

Looks like @jasontedor has already opened a PR for this one so I'm assigning it to him just to make the reporting gods happy.

jasontedor mentioned this issue Mar 14, 2020

Improve performance of shards limits decider #53577

Merged

nik9000 added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Mar 18, 2020

nik9000 assigned jasontedor Mar 18, 2020

jasontedor closed this as completed in #53577 Mar 19, 2020

codebrain mentioned this issue Apr 1, 2020

7.7.0 meta ticket (Part 3) elastic/elasticsearch-net#4534

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster access to started shards count of the index in each node #53559

Faster access to started shards count of the index in each node #53559

kkewwei commented Mar 13, 2020 •

edited

Loading

LoadingZhang commented Mar 14, 2020

elasticmachine commented Mar 18, 2020

nik9000 commented Mar 18, 2020

Faster access to started shards count of the index in each node #53559

Faster access to started shards count of the index in each node #53559

Comments

kkewwei commented Mar 13, 2020 • edited Loading

LoadingZhang commented Mar 14, 2020

elasticmachine commented Mar 18, 2020

nik9000 commented Mar 18, 2020

kkewwei commented Mar 13, 2020 •

edited

Loading