flush cache before segment merge (#4955) #4969
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an automated cherry-pick of #4955
What problem does this PR solve?
Issue Number: close #4956
Problem Summary:
When do segment split, we try to copy the tail column files in the delta layer of the original segment to the new result
segments. So the new segments may contain data that doesn't belong to its segment range.
And this is ok for most cases, because the redundant data will be filtered out by the segment range when serve the read requests to the segment. So the redundant is invisible in almost all cases.
But when do segment merge later, if the previous redundant data is still not flushed to disk, it will be directly copied to the new merged segment again.
So the redundant data in each segment become visible again after segment merge which may cause potential data incorrectness.
What is changed and how it works?
Flush cache before every merge operation. So the potential unsaved data will be filtered out by the segment range when do merge.
Check List
Tests
Side effects
Documentation
Release note