Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for prewarm when relocating searchable snapshot shards #65531

Merged

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Nov 26, 2020

Add hooks to enable waiting for a condition before completing the clean files step for relocating searchable snapshot shards and use them to wait for pre-warm before responding to the clean files request.

Add hooks to enable waiting for a condition before relocation handoff.
Make timeout on relocation unbounded and add ability to disable recovery
liveness checker temporarily while running prewarm.
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 26, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

// Due to relocation conditions on the shard it could take a while for the hand-off to complete so we disable the recovery
// monitor since we don't expect any transport messages from master for the duration of the handoff and activate it again
// after the handoff.
final Releasable disabledMonitor = recoveryRef.target().disableRecoveryMonitor();
Copy link
Member Author

@original-brownbear original-brownbear Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a BwC issue I guess. If the hand-off request comes from 7.10 and doesn't wait indefinitely yet then it could timeout on the primary I guess but maybe we can just ignore it since it's so fringe?

recoveryMonitorEnabled = false;
return () -> {
setLastAccessTime();
recoveryMonitorEnabled = true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little low-tech relative to the tricky ref-counting in the IndexShard. I figured this was ok here since the hand-off request only comes in once (at least judging by the assertions we have in IndexShard) while the other API has a more "feel" to it and there are no hard guarantees on the index shard state listener only being invoked once (though the "loaded" flag on the directory effectively guarantees we only add one condition for now) and it wasn't that much extra effort since the API was supposed to be non-blocking anyway.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial read of the production code. I am in doubt about the approach taken, since we block all operations on the source during the relocation (in IndexShard.relocate). I think this will prevent other recoveries from initiating until the request is done. For searchable snapshots with replicas this is unfortunate. Also, any operation happening trying to acquire the primary permit will be queued up.

It may not be too important for searchable snapshots, but it seems counter intuitive to wait in this specific critical operation rather than outside it? Maybe we could hook this into finalizeRecovery instead?

}
lastSeenAccessTime = accessTime;
} else {
lastSeenAccessTime = System.nanoTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be simpler to just fake the progress inside RecoveryTarget by returning System.nanoTime()?

@original-brownbear
Copy link
Member Author

Thanks @henningandersen I adjusted this PR now to work via the clean files handler like we discussed. Let me know what you think :)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

});

logger.info("--> sleep for 5s to ensure we are actually stuck at the FINALIZE stage and that the primary has not yet relocated");
TimeUnit.SECONDS.sleep(5L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead find the shard using internalCluster().getInstance(IndicesService.class, node) and assertBusy that it has a pending after cleanup action?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ thanks, you actually prevented a likely test failure here as well :) I moved the check for translog stage to a busy assert and then added the check for one clean files condition after. Otherwise we'd only have had 5s to arrive at TRANSLOG now at least we have 10 which should be a little safer.

@original-brownbear
Copy link
Member Author

Thanks Henning!

@original-brownbear original-brownbear merged commit e189a20 into elastic:master Dec 9, 2020
@original-brownbear original-brownbear deleted the wait-for-prewarm branch December 9, 2020 10:45
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Dec 9, 2020
…65531)

Add hooks to enable waiting for a condition before completing the clean files step for relocating searchable snapshot shards and use them to wait for pre-warm before responding to the clean files request.
original-brownbear added a commit that referenced this pull request Dec 9, 2020
…66096)

Add hooks to enable waiting for a condition before completing the clean files step for relocating searchable snapshot shards and use them to wait for pre-warm before responding to the clean files request.
@original-brownbear original-brownbear restored the wait-for-prewarm branch January 4, 2021 01:09
@pugnascotia pugnascotia changed the title Wait for Prewarm when Relocating Searchable Snapshot Shards Wait for prewarm when relocating searchable snapshot shards Jan 5, 2021
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Jan 19, 2021
This commit introduces a change where searchable snapshots
skip the TRANSLOG stage. Since elastic#65531 was introduced, the
cleanFiles peer recovery stage is blocked until the prewarming
completes (this is done to avoid search latency spikes due to a cold
cache). In that phase, the RecoveryState stage is
TRANSLOG which can be confusing as we don't replay
any ops during searchable snapshots recoveries. In order
to avoid that confusion we transition directly to
FINALIZE stage.
fcofdez added a commit that referenced this pull request Mar 22, 2021
This commit introduces a change where searchable snapshots
skip the RecoveryState TRANSLOG stage. Since #65531 was introduced, the
cleanFiles peer recovery phase is blocked until the prewarming
completes (this is done to avoid search latency spikes due to a cold
cache). In that phase, the RecoveryState stage is
TRANSLOG which can be confusing as we don't replay
any ops during searchable snapshots recoveries. In order
to avoid that confusion we transition directly to
FINALIZE stage.
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Mar 22, 2021
This commit introduces a change where searchable snapshots
skip the RecoveryState TRANSLOG stage. Since elastic#65531 was introduced, the
cleanFiles peer recovery phase is blocked until the prewarming
completes (this is done to avoid search latency spikes due to a cold
cache). In that phase, the RecoveryState stage is
TRANSLOG which can be confusing as we don't replay
any ops during searchable snapshots recoveries. In order
to avoid that confusion we transition directly to
FINALIZE stage.

Backport of elastic#70311
fcofdez added a commit that referenced this pull request Mar 22, 2021
This commit introduces a change where searchable snapshots
skip the RecoveryState TRANSLOG stage. Since #65531 was introduced, the
cleanFiles peer recovery phase is blocked until the prewarming
completes (this is done to avoid search latency spikes due to a cold
cache). In that phase, the RecoveryState stage is
TRANSLOG which can be confusing as we don't replay
any ops during searchable snapshots recoveries. In order
to avoid that confusion we transition directly to
FINALIZE stage.

Backport of #70311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants