Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily load soft-deletes for searchable snapshot shards #69203

Merged
merged 11 commits into from
Feb 22, 2021

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Feb 18, 2021

Opening a Lucene index that supports soft-deletes currently creates the liveDocs bitset eagerly. This requires scanning the doc values to materialize the liveDocs bitset from the soft-delete doc values. In order for searchable snapshot shards to be available for searches as quickly as possible (i.e. on recovery, or in case of FrozenEngine whenever a search comes in), they should read as little as possible from the Lucene files.

This PR introduces a LazySoftDeletesDirectoryReaderWrapper, a variant of Lucene's SoftDeletesDirectoryReaderWrapper that loads the livedocs bitset lazily on first access. It is special-tailored to ReadOnlyEngine / FrozenEngine as it only operates on non-NRT readers.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class makes sense for searchable snapshots. I've left some small comments.

@ywelsch ywelsch marked this pull request as ready for review February 19, 2021 12:38
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 19, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with smaller comments. Thanks Yannick!

}
}

private static class DelegatingCacheHelper implements CacheHelper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this bit is the only reason why we need to have this class in the oal.index package. Can we find a way to avoid doing cross-JAR package-protected access? (Is there another option than making the CacheKey ctor public?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other reason except for CacheKey is the call to the PendingSoftDeletes.applySoftDeletes method. That method can easily be reimplemented, however. I don't see a good way to work around the CacheKey constructor not being accessible (safe for introducing even more outrageous hacks), and FWIW we already have some classes like ShuffleForcedMergePolicy and OneMergeHelper in this package.

@ywelsch
Copy link
Contributor Author

ywelsch commented Feb 22, 2021

@elasticmachine update branch (test failure unrelated)

@ywelsch ywelsch requested a review from jpountz February 22, 2021 09:14
@ywelsch
Copy link
Contributor Author

ywelsch commented Feb 22, 2021

@elasticmachine run elasticsearch-ci/1 (yet another unrelated test failure ...)

@ywelsch ywelsch merged commit f2a1e02 into elastic:master Feb 22, 2021
ywelsch added a commit that referenced this pull request Feb 23, 2021
Opening a Lucene index that supports soft-deletes currently creates the liveDocs bitset eagerly. This requires scanning
the doc values to materialize the liveDocs bitset from the soft-delete doc values. In order for searchable snapshot shards
to be available for searches as quickly as possible (i.e. on recovery, or in case of FrozenEngine whenever a search comes
in), they should read as little as possible from the Lucene files.

This commit introduces a LazySoftDeletesDirectoryReaderWrapper, a variant of Lucene's
SoftDeletesDirectoryReaderWrapper that loads the livedocs bitset lazily on first access. It is special-tailored to
ReadOnlyEngine / FrozenEngine as it only operates on non-NRT readers.
ywelsch added a commit that referenced this pull request Feb 23, 2021
Opening a Lucene index that supports soft-deletes currently creates the liveDocs bitset eagerly. This requires scanning
the doc values to materialize the liveDocs bitset from the soft-delete doc values. In order for searchable snapshot shards
to be available for searches as quickly as possible (i.e. on recovery, or in case of FrozenEngine whenever a search comes
in), they should read as little as possible from the Lucene files.

This commit introduces a LazySoftDeletesDirectoryReaderWrapper, a variant of Lucene's
SoftDeletesDirectoryReaderWrapper that loads the livedocs bitset lazily on first access. It is special-tailored to
ReadOnlyEngine / FrozenEngine as it only operates on non-NRT readers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.12.1 v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants