Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use local checkpoint to calculate min translog gen for recovery #52841

Closed
wants to merge 8 commits into from

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Feb 26, 2020

Today we use the translog_generation of the safe commit as the minimum
required translog generation for recovery. This approach has a
limitation, where we won't be able to clean up translog unless we flush.
Reopening an already recovered engine will create a new empty translog,
and we leave it there until we force flush.

This commit removes the translog_generation commit tag and uses the
local checkpoint of the safe commit to calculate the minimum required
translog generation for recovery instead.

Closes #49970
Backport of #51905

dnhatn and others added 8 commits February 26, 2020 12:04
…tic#51905)

Today we use the translog_generation of the safe commit as the minimum
required translog generation for recovery. This approach has a
limitation, where we won't be able to clean up translog unless we flush.
Reopening an already recovered engine will create a new empty translog,
and we leave it there until we force flush.

This commit removes the translog_generation commit tag and uses the
local checkpoint of the safe commit to calculate the minimum required
translog generation for recovery instead.

Closes elastic#49970
Separates the translog from the index deletion conditions (allowing the translog to be cleaned
up more eagerly), and avoids taking the write lock on the translog if no clean-up is actually
necessary.
Since elastic#51905, we skip translog recovery if the local checkpoint of the
safe commit equals to the global checkpoint. This change adjusts the
test not to create a new snapshot in that case.

Closes elastic#52221
Relates elastic#51905
Since elastic#51905, we use the local checkpoint of the safe commit to
calculate the number of uncommitted operations of a translog stats. If a
periodic flush triggered by afterWriteOperation completes before we sync
translog, then the last commit is not safe. We also need to sync
translog from Engine instead of the translog so that we can advance the
safe commit.

Relates elastic#51905
Closes elastic#52223
Asserts that no new operations are made into the translog since we re-opened the engine.

Relates elastic#51905
Closes elastic#52410
Adjusts the assertion as we might eagerly clean up translog during resync since elastic#52556

Relates elastic#52556
Closes elastic#52598
Adjusts the assertion as we trim translog more eagerly since elastic#52556.

Relates elastic#52556
Closes elastic#52148
We aren't able to reproduce or figure out the reason that failed this test.
This commit adds more assertions so we can narrow the scope.

Relates elastic#52223
@dnhatn
Copy link
Member Author

dnhatn commented Feb 26, 2020

@elasticmachine test this please

@dnhatn dnhatn closed this Feb 26, 2020
@dnhatn dnhatn deleted the 7x-seqno-tlog-policy branch February 26, 2020 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants