Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster access to started shards count of the index in each node #53559

Closed
kkewwei opened this issue Mar 13, 2020 · 3 comments · Fixed by #53577
Closed

Faster access to started shards count of the index in each node #53559

kkewwei opened this issue Mar 13, 2020 · 3 comments · Fixed by #53577
Assignees
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes)

Comments

@kkewwei
Copy link
Contributor

kkewwei commented Mar 13, 2020

ES_VERSION: 7.6.0
JVM version : JDK1.8.0_112
OS version : linux
Description of the problem including expected versus actual behavior:
As it's known, Updating ClusterState on master may cost too much time, which is not good for cluster. During the updating ClusterState, ShardsLimitAllocationDecider deciders iterate through all the shards on a node to find STARTED ones belonging to the index when cluster.routing.allocation.total_shards_per_node > 0, Which will cost too much time.

In out product, There are 39 nodes and 2,000 indices, 50,000 shards, but the time to update cluster state reach at 3.4min, It's intolerable.

To find out why it cost so much time on updating cluste state, I get the thread stack about updateTask, such that:

"[node-1][clusterService#updateTask][T#1]" #21 daemon prio=5 os_prio=0 tid=0x00007fc5c88fa800 nid=0x3369 runnable [0x00007fc58431a000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
        at org.elasticsearch.cluster.routing.allocation.decider.ShardsLimitAllocationDecider.doDecide(ShardsLimitAllocationDecider.java:112)
        at org.elasticsearch.cluster.routing.allocation.decider.ShardsLimitAllocationDecider.canAllocate(ShardsLimitAllocationDecider.java:88)
        at org.elasticsearch.cluster.routing.allocation.decider.AllocationDeciders.canAllocate(AllocationDeciders.java:73)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.decideMove(BalancedShardsAllocator.java:707)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.moveShards(BalancedShardsAllocator.java:648)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:123)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:329)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.applyStartedShards(AllocationService.java:100)
        at org.elasticsearch.cluster.action.shard.ShardStateAction$ShardStartedClusterStateTaskExecutor.execute(ShardStateAction.java:438)
        at org.elasticsearch.cluster.service.ClusterService.executeTasks(ClusterService.java:634)
        at org.elasticsearch.cluster.service.ClusterService.calculateTaskOutputs(ClusterService.java:612)

I try several times and get the same thread stack, it seems that ShardsLimitAllocationDecider.doDecide will cost too much time, the related code:

       if (indexShardLimit <= 0 && clusterShardLimit <= 0) {
            return allocation.decision(Decision.YES, NAME, "total shard limits are disabled: [index: %d, cluster: %d] <= 0",
                    indexShardLimit, clusterShardLimit);
        }
        int indexShardCount = 0;
        int nodeShardCount = 0;
        for (ShardRouting nodeShard : node) {
            // don't count relocating shards...
            if (nodeShard.relocating()) {
                continue;
            }
            nodeShardCount++;
            if (nodeShard.index().equals(shardRouting.index())) {
                indexShardCount++;
            }
        }

It will iterate 50000*50000/39 = 64,000,000 times, which will cost too much time.

There is room for optimization to avoid iterating the node:
1.If indexShardLimit=-1 and clusterShardLimit>0, we need't to count indexShardCount and nodeShardCount by iterating, nodeShardCount = node.size() - node.numberOfShardsWithState(ShardRoutingState.RELOCATING), indexShardCount is useless.
2. If we could count the started shards of each index in each node in RoutingNode to avoid the iteration?

@LoadingZhang
Copy link
Contributor

We met same problem here.
8000 indices and 50000 shards in cluster, and we set indexShardLimit every index to manually balance shards, this causes updating cluster state costs tens of seconds.

@nik9000 nik9000 added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Mar 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Allocation)

@nik9000
Copy link
Member

nik9000 commented Mar 18, 2020

Looks like @jasontedor has already opened a PR for this one so I'm assigning it to him just to make the reporting gods happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants