Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve translog corruption detection #47873

Conversation

DaveCTurner
Copy link
Contributor

Today we do not throw a TranslogCorruptedException in certain cases of
translog corruption, such as for a corrupted checkpoint file or when an
expected file (either checkpoint or translog) is completely missing or
truncated. This means that elasticsearch-shard will not truncate the translog
in those cases.

This commit strengthens the translog corruption tests to corrupt and/or delete
both translog and checkpoint files, and ensures that a
TranslogCorruptedException is thrown in all cases. It also sometimes
simulates a recovery after a crash while rolling the translog generation,
including cases where the rolled checkpoint contains incorrect data. This
backports #42980, #42744 and #44217 to 6.8.

It also backports #41480 to adjust the tool to check shards regardless of
whether there is a corruption marker.

Co-authored-by: Henning Andersen [email protected]

Today we do not throw a `TranslogCorruptedException` in certain cases of
translog corruption, such as for a corrupted checkpoint file or when an
expected file (either checkpoint or translog) is completely missing or
truncated. This means that `elasticsearch-shard` will not truncate the translog
in those cases.

This commit strengthens the translog corruption tests to corrupt and/or delete
both translog and checkpoint files, and ensures that a
`TranslogCorruptedException` is thrown in all cases. It also sometimes
simulates a recovery after a crash while rolling the translog generation,
including cases where the rolled checkpoint contains incorrect data. This
backports elastic#42980, elastic#42744 and elastic#44217 to 6.8.

It also backports elastic#41480 to adjust the tool to check shards regardless of
whether there is a corruption marker.

Co-authored-by: Henning Andersen <[email protected]>
@DaveCTurner DaveCTurner added >bug :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v6.8.4 labels Oct 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks @DaveCTurner

@DaveCTurner DaveCTurner merged commit e71cb81 into elastic:6.8 Oct 11, 2019
@DaveCTurner DaveCTurner deleted the 2019-10-10-translog-corruption-detection-6.8 branch October 11, 2019 14:58
@tomcallahan tomcallahan removed the :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. label Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v6.8.4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants