Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Needless Cache Status Fetches in SearchableSnapshotAllocator #66433

Merged

Conversation

original-brownbear
Copy link
Member

We shouldn't fetch cache status if no allocation is possible to begin with.
Also, this surfaced an issue with using the Client to reroute since that
won't retry stale shards (failed the invalid license IT for example) so I moved
to using the RerouteService like we do in the GatewayAllocator.
(Plus, dried up one method that was 100% the same as in the replica allocator)

We shouldn't fetch cache status if no allocation is possible to begin with.
Also, this surfaced an issue with using the `Client` to `reroute` since that
won't retry stale shards (failed the invalid license IT for example) so I moved
to using the `RerouteService` like we do in the `GatewayAllocator`.
(Plus, dried up one method that was 100% the same as in the replica allocator)
@original-brownbear original-brownbear added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.0.0 v7.11.0 labels Dec 16, 2020
@elasticmachine elasticmachine added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Dec 16, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

1 similar comment
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@@ -44,7 +44,7 @@
explanations = RoutingExplanations.readFrom(in);
}

public ClusterRerouteResponse(boolean acknowledged, ClusterState state, RoutingExplanations explanations) {
ClusterRerouteResponse(boolean acknowledged, ClusterState state, RoutingExplanations explanations) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a revert of making this public yesterday now that it's not needed for tests any longer.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though it would be good to add a test to verify the problem.

// pre-check if it can be allocated to any node that currently exists, so we won't list the cache sizes for it for nothing
// TODO: in the following logic, we do not account for existing cache size when handling disk space checks, should and can we
// reliably do this in a world of concurrent cache evictions or are we ok with the cache size just being a best effort hint
// here?
Tuple<Decision, Map<String, NodeAllocationResult>> result = canBeAllocatedToAtLeastOneNode(shardRouting, allocation);
Tuple<Decision, Map<String, NodeAllocationResult>> result = ReplicaShardAllocator.canBeAllocatedToAtLeastOneNode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that we do not trigger any reads or reroutes when deciders say no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I pushed 0cb0017 :)

@original-brownbear
Copy link
Member Author

Thanks Henning!

@original-brownbear original-brownbear merged commit 7c65a2b into elastic:master Dec 16, 2020
@original-brownbear original-brownbear deleted the avoid-unnecessary-fetches branch December 16, 2020 13:48
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Dec 16, 2020
…lastic#66433)

We shouldn't fetch cache status if no allocation is possible to begin with.
Also, this surfaced an issue with using the `Client` to `reroute` since that
won't retry stale shards (failed the invalid license IT for example) so I moved
to using the `RerouteService` like we do in the `GatewayAllocator`.
(Plus, dried up one method that was 100% the same as in the replica allocator)
original-brownbear added a commit that referenced this pull request Dec 16, 2020
…66433) (#66444)

We shouldn't fetch cache status if no allocation is possible to begin with.
Also, this surfaced an issue with using the `Client` to `reroute` since that
won't retry stale shards (failed the invalid license IT for example) so I moved
to using the `RerouteService` like we do in the `GatewayAllocator`.
(Plus, dried up one method that was 100% the same as in the replica allocator)
@original-brownbear original-brownbear restored the avoid-unnecessary-fetches branch January 4, 2021 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants