Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/filelog] Read lost files first in a poll cycle #11889

Merged

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Jul 4, 2022

Description:
We currently read lost files at the end of each poll cycle. As a result, we will always read lines from rotated files after reading them from newly created files, resulting in them being emitted in a different order than they were written. This usually doesn't matter, but it does when using the recombine operator. We ran into this when handling multipart logs generated by container runtimes in K8s.

This change moves handling lost files to the start of the poll cycle, directly after files for the cycle are detected and opened. I believe that it guarantees both durability and ordering if the rotation behaves reasonably, that is:

  • files aren't written to after having been rotated
  • rotated files don't match the pattern

@djaglowski please check me on this. My reasoning goes as follows:

  • files read in a given cycle are determined in makeReaders, when we Open them to calculate Fingerprints
  • a file is lost if it was present in the previous cycle, but not the current one
  • even if a file is rotated during a cycle, it'll be considered lost in the next one, and ordering will be maintained

Link to tracking Issue: #12084

Testing:
Added a test verifying that we actually get log lines in the right order. This test fails without the other changes.

Documentation:
Added a section about log line ordering across rotations to design.md.

@swiatekm
Copy link
Contributor Author

swiatekm commented Jul 4, 2022

Hm, I can see the Unix and Windows implementations of the roller make different assumptions about when they're called. I wonder if we should change the API for the roller, or just switch the Windows roller to keep old readers and only close those. I'll do the former for now.

@swiatekm swiatekm force-pushed the fix/fileinput/lost-files-first branch 4 times, most recently from 2b6a8dd to bb45c03 Compare July 4, 2022 13:34
@djaglowski
Copy link
Member

I think the way you've split up the interface makes sense. This looks good to me.

Please add a changelog entry. (note the new process)

@swiatekm swiatekm force-pushed the fix/fileinput/lost-files-first branch from bb45c03 to 40ad002 Compare July 5, 2022 14:52
@swiatekm swiatekm marked this pull request as ready for review July 5, 2022 14:52
@swiatekm swiatekm requested a review from a team July 5, 2022 14:52
@swiatekm swiatekm requested a review from djaglowski as a code owner July 5, 2022 14:52
@swiatekm swiatekm changed the title [receiver/filelog] Read lost files first each interval in file input [receiver/filelog] Read lost files first in a poll cycle Jul 5, 2022
@swiatekm
Copy link
Contributor Author

swiatekm commented Jul 5, 2022

I think the way you've split up the interface makes sense. This looks good to me.

Please add a changelog entry. (note the new process)

Done.

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for for the contribution @swiatekm-sumo and for your patience rehashing the details with me.

@djaglowski djaglowski merged commit 449f415 into open-telemetry:main Jul 5, 2022
@swiatekm swiatekm deleted the fix/fileinput/lost-files-first branch July 5, 2022 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants