[receiver/filelog] Read lost files first in a poll cycle #11889
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
We currently read lost files at the end of each poll cycle. As a result, we will always read lines from rotated files after reading them from newly created files, resulting in them being emitted in a different order than they were written. This usually doesn't matter, but it does when using the recombine operator. We ran into this when handling multipart logs generated by container runtimes in K8s.
This change moves handling lost files to the start of the poll cycle, directly after files for the cycle are detected and opened. I believe that it guarantees both durability and ordering if the rotation behaves reasonably, that is:
@djaglowski please check me on this. My reasoning goes as follows:
makeReaders
, when we Open them to calculate FingerprintsLink to tracking Issue: #12084
Testing:
Added a test verifying that we actually get log lines in the right order. This test fails without the other changes.
Documentation:
Added a section about log line ordering across rotations to design.md.