Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip TRANSLOG stage for searchable snapshots recovery stage #67697

Closed

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Jan 19, 2021

This commit introduces a change where searchable snapshots
skip the RecoveryState TRANSLOG stage. Since #65531 was introduced, the
cleanFiles peer recovery stage is blocked until the prewarming
completes (this is done to avoid search latency spikes due to a cold
cache). In that phase, the RecoveryState stage is
TRANSLOG which can be confusing as we don't replay
any ops during searchable snapshots recoveries. In order
to avoid that confusion we transition directly to
FINALIZE stage.

This commit introduces a change where searchable snapshots
skip the TRANSLOG stage. Since elastic#65531 was introduced, the
cleanFiles peer recovery stage is blocked until the prewarming
completes (this is done to avoid search latency spikes due to a cold
cache). In that phase, the RecoveryState stage is
TRANSLOG which can be confusing as we don't replay
any ops during searchable snapshots recoveries. In order
to avoid that confusion we transition directly to
FINALIZE stage.
@fcofdez fcofdez added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v7.12.0 v8.0.0 labels Jan 19, 2021
@fcofdez fcofdez marked this pull request as ready for review January 19, 2021 15:05
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

// filter for relocations that are not in stage FINALIZE (they could end up in this stage without progress for good if the
// target node does not have enough cache space available to hold the primary completely
.filter(recoveryState -> recoveryState.getSourceNode() != null && recoveryState.getStage() != RecoveryState.Stage.FINALIZE)
.filter(recoveryState -> recoveryState.getSourceNode() != null)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could check at the end that the pre-warming phase ended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea we'd have to do something in that direction, otherwise https://github.com/elastic/elasticsearch/pull/67697/files#diff-be91049c27fa011e281944d790285ad97bbd72466aeeed55525c5b99061c9e13L113 will trip for small cache sizes that are randomly chosen in the parent test suite because we will get stuck in FINALIZE forever right?

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a neat change in general. I also couldn't find anything that is broken by it.
I'd like someone else to have a look here as well though. I couldn't find any side effects of hacking the stage like this outside of the assertion issues you fixed though and the behavior makes logical sense to me.

Maybe @henningandersen has a sec to double check that we can go this route in general?

@@ -101,6 +101,11 @@ public static Stage fromId(byte id) {
}
}

public enum IndexType {
REGULAR,
SEARCHABLE_SNAPSHOT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little nasty, I'm not 100% sure what our policy is here but I think we should avoid leaking searchable snapshots this explicitly in here. Maybe we could call this "NO_TRANSLOG" or so?

// filter for relocations that are not in stage FINALIZE (they could end up in this stage without progress for good if the
// target node does not have enough cache space available to hold the primary completely
.filter(recoveryState -> recoveryState.getSourceNode() != null && recoveryState.getStage() != RecoveryState.Stage.FINALIZE)
.filter(recoveryState -> recoveryState.getSourceNode() != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea we'd have to do something in that direction, otherwise https://github.com/elastic/elasticsearch/pull/67697/files#diff-be91049c27fa011e281944d790285ad97bbd72466aeeed55525c5b99061c9e13L113 will trip for small cache sizes that are randomly chosen in the parent test suite because we will get stuck in FINALIZE forever right?

@original-brownbear
Copy link
Member

BTW: I also wonder if we really need to be this invasive here. Maybe we could simply add a filter to for the stage to render in APIs (e.g. getDisplayStage) to RecoveryState (which by default would just be delegating to getStage but would be overridden to FINALIZE instead of TRANSLOG for relocation + translog stage + "not prewarmed" ) That seems a lot less invasive and would fix the APIs without functional impact on the rest of the functionality? (+ would maximally limit leaking searchable snapshots logic into core)

WDYT?

@henningandersen
Copy link
Contributor

I wonder if we could instead move setting stage to TRANSLOG into the after clean files runnable?

@original-brownbear
Copy link
Member

I wonder if we could instead move setting stage to TRANSLOG into the after clean files runnable?

++ that seems pretty side effect free and quick.

@fcofdez
Copy link
Contributor Author

fcofdez commented Jan 28, 2021

I wonder if we could instead move setting stage to TRANSLOG into the after clean files runnable?

But in that case the recovery stage would remain in VERIFY_INDEX, right?

Maybe we could simply add a filter to for the stage to render in APIs

I like this approach, I'll go ahead with that.

@henningandersen
Copy link
Contributor

I think we could stay in INDEX with reasonable effort. I wonder if that would be more accurate, since the shard is not serving requests. But the display filter approach could also work out.

@mark-vieira
Copy link
Contributor

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Feb 9, 2021

Sorry, this fell through the cracks. I've opened a new PR with the suggested approach (#68680)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants