Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup BalanceUnbalancedClusterTests #88794

Merged

Conversation

idegtiarenko
Copy link
Contributor

This commit speeds up the above tests (and probably many others) by changing how
we assert the invariant. Previously invariant was checked by rebuilding
internal collections from scratch and comparing them against ones already
present in the object after every single modification twice. This commit
verifies the invariant once after all bulk changes.

Closes: #12629

This commit speeds up the above tests (and probably many others) by changing how
 we assert the invariant. Previously invariant was checked by rebuilding
 internal collections from scratch and comparing them against ones already
 present in the object after every single modification twice. This commit
 verifies the invariant once after all bulk changes.
@idegtiarenko idegtiarenko added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.4.0 labels Jul 26, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

}

void update(ShardRouting oldShard, ShardRouting newShard) {
assert invariant();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The invariant asserted after operation as well. I do not think it is required to do it twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -145,6 +145,9 @@ private RoutingNodes(RoutingTable routingTable, DiscoveryNodes discoveryNodes, b
}
}
}
for (var node : nodesToShards.values()) {
assert node.invariant();
}
Copy link
Contributor Author

@idegtiarenko idegtiarenko Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invariant checking is expensive in case of big collections so it is verified once in the end instead of after every operation.

This is constructing RoutingNodes from RoutingTable so no other concurrent additions is expected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. One suggestion: I think we do the invariant check in one line with method references:

nodesToShards.values().forEach(RoutingNode::invariant)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link #88951 :)

shardRoutingsRelocating.add(shard);
}
shardRoutingsByIndex.computeIfAbsent(shard.index(), k -> new HashSet<>(10)).add(shard);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was rebuild for assertion twice after any shard addition to the routing node.
This was especially slow for a test with 4k shards.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could do something more stupid instead and just skip the expensive bit of the invariant check most of the time (perhaps 99% of the time if the cluster is "large" in some sense)?

@arteam arteam self-assigned this Jul 27, 2022
@mark-vieira mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022
Copy link
Contributor

@arteam arteam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I've left a few comments, thank you for working on this Ievgen!

if (shard.relocating()) {
shardRoutingsRelocating.add(shard);
}
shardRoutingsByIndex.computeIfAbsent(shard.index(), k -> new HashSet<>(10)).add(shard);
Copy link
Contributor

@arteam arteam Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to use Sets.newHashSetWithExpectedSize here to correctly pre-size the HashSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we do not know expected size here. I used some random number to avoid initial resize(s)

assert initializingShards.size() == shardRoutingsInitializing.size();
assert initializingShards.containsAll(shardRoutingsInitializing);
boolean invariant() {
var shardRoutingsInitializing = new ArrayList<ShardRouting>(shards.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about declaring shardRoutingsInitializing and shardRoutingsRelocating as HashSet? Then we can do assertions as simple equals calls

assert initializingShards.equals(shardRoutingsInitializing);
assert relocatingShards.equals(shardRoutingsRelocating);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it, but then decided to keep Lists to avoid calculating hash for each item.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashCode is cached for ShardRouting, but I agree that it does seem redundant is this case, bceause these new collections are never queried.

addInternal(shard, true);
}

void addNoValidate(ShardRouting shard) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about addWithoutValidation? It sounds more natural for me.

}

void update(ShardRouting oldShard, ShardRouting newShard) {
assert invariant();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -145,6 +145,9 @@ private RoutingNodes(RoutingTable routingTable, DiscoveryNodes discoveryNodes, b
}
}
}
for (var node : nodesToShards.values()) {
assert node.invariant();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. One suggestion: I think we do the invariant check in one line with method references:

nodesToShards.values().forEach(RoutingNode::invariant)

Copy link
Contributor

@arteam arteam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank Ievgen!

@idegtiarenko idegtiarenko merged commit c755e7b into elastic:main Jul 28, 2022
@idegtiarenko idegtiarenko deleted the speedup_BalanceUnbalancedClusterTests branch July 28, 2022 13:48
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 29, 2022
* upstream/main:
  Add 8.5 migration docs (elastic#88923)
  Script: Reindex & UpdateByQuery Metadata (elastic#88665)
  Remove unused plugins dir var from server CLI (elastic#88917)
  Use tracing API in TaskManager (elastic#88885)
  Add source fallback for keyword fields using operation (elastic#88735)
  Prune changelogs after 8.3.3 release
  Bump versions after 8.3.3 release
  Add a test for checking for misspelled "dry_run" parameters for Desired Nodes API (elastic#88898)
  Speedup BalanceUnbalancedClusterTests (elastic#88794)
  Preventing exceptions on node shutdown in integration tests (elastic#88827)
  Do not trigger check part3 for test mute and docs PRs (elastic#88895)
  Add troubleshooting docs about data corruption (elastic#88760)
  Mute RollupActionSingleNodeTests#testRollupDatastream (elastic#88891)
  [DOCS] Domain splitting impacts API keys (elastic#88677)
  Fix SqlSearchIT testAllTypesWithRequestToOldNodes (elastic#88866) (elastic#88883)
  Update synthetic-source.asciidoc (elastic#88880)
  Log more details in TaskAssertions (elastic#88864)
  Make Tuple a record (elastic#88280)
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jul 29, 2022
This needs to be in a separate method, it's currently running in production
and uses significant CPU time.

Broken in elastic#88794
original-brownbear added a commit that referenced this pull request Jul 29, 2022
This needs to be in a separate method, it's currently running in production
and uses significant CPU time.

Broken in #88794
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify BalanceUnbalancedClusterTest
6 participants