Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of truncated files for Filestream input #38416

Merged
merged 13 commits into from
Apr 12, 2024

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Mar 19, 2024

Proposed commit message

This commit fixes the handling of file truncation in the Filestream input. In some file truncation cases Filebeat would detect the truncation but it would not set the offset to zero, leading to the registry offset always increasing and causing the file to be re-ingested every time Filebeat was restarted.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

## Author's Checklist

How to test this PR locally

All scripts and configuration files are at the end of this section.

1. File is truncated while Filebeat is running

  1. Create a log file and keep adding data to it. Make sure to close the file after each line is added. If the file is still open and being written to while it's truncated the OS will likely not reset the file size. Use loop-add-data.sh
  2. Start Filebeat
  3. Truncate the file: truncate -s 50 /tmp/logs/log.log`. This will truncate the file but leave a whole line to be ingested later on.
  4. Ensure Filebeat detects the file truncation and starts reading from the beginning
  5. Ensure Filebeat correctly update the offset in the registry to 50 once it finishes reading the file

2. File is truncated while Filebeat is stopped

  1. Create a log file. Use gen-file.sh
  2. Start Filebeat
  3. Ensure the whole file is read
  4. Stop Filebeat
  5. Ensure the registry contain the correct offset: 1000.
  6. Truncate the file: truncate -s 50 /tmp/logs/log.log`. This will truncate the file but leave a whole line to be ingested later on.
  7. Start Filebeat
  8. Ensure Filebeat detects the file truncation and starts reading from the beginning
  9. Ensure Filebeat correctly update the offset in the registry to 50 once it finishes reading the file

Configuration file and scripts

loop-add-data.sh

#!/bin/bash
cd /tmp/logs

rm foo.log
# Each line is 50 bytes
i=0
while true
do
    # Each log line is 50 bytes
    printf "log line with some padding %10d ===========\n" $i >> /tmp/logs/foo.log
    let "i=i+1"
    sleep 1
done

gen-file.sh

#!/bin/bash
cd /tmp/logs

rm foo.log
# Each line is 50 bytes
for i in {1..20}
do
    # Each log line is 50 bytes
    printf "log line with some padding %10d ===========\n" $i >> /tmp/logs/foo.log
    #sleep 0.05
done
ls -laih /tmp/logs

filebeat.yml

filebeat.inputs:
  - type: filestream
    id: filestream-input-id
    enabled: true
    prospector.scanner.check_interval: 30s
    paths:
      - /tmp/logs/foo.log*

output:
  file:
    enabled: true
    codec.json:
      pretty: false
    path: ${path.home}
    filename: "output"
    rotate_on_startup: true

queue.mem:
  flush:
    timeout: 1s
    min_events: 32

filebeat.registry.flush: 1s

logging:
  level: debug
  selectors:
    - file_watcher
    - input.filestream
    - input.harvester
  metrics:
    enabled: false

Related issues

## Use cases
## Screenshots
## Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 19, 2024
Copy link
Contributor

mergify bot commented Mar 19, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 19, 2024

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 144 min 8 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@belimawr belimawr added skip-ci Skip the build in the CI but linting Team:Elastic-Agent Label for the Agent team labels Mar 19, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 19, 2024
@belimawr belimawr force-pushed the filestream-truncation-bug branch from f1a1d69 to 30814ec Compare March 19, 2024 19:58
@belimawr belimawr changed the title [WIP] Filestream truncation bug Fix handling of truncated files for Filestream input Mar 20, 2024
Copy link
Contributor

mergify bot commented Mar 20, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b filestream-truncation-bug upstream/filestream-truncation-bug
git merge upstream/main
git push upstream filestream-truncation-bug

@belimawr belimawr force-pushed the filestream-truncation-bug branch from 43e82b4 to 0ef8349 Compare March 20, 2024 08:10
@belimawr belimawr removed the skip-ci Skip the build in the CI but linting label Mar 20, 2024
@belimawr belimawr marked this pull request as ready for review March 20, 2024 08:44
@belimawr belimawr requested a review from a team as a code owner March 20, 2024 08:44
@belimawr belimawr requested review from ycombinator and rdner March 20, 2024 08:44
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pierrehilbert pierrehilbert requested review from faec and removed request for ycombinator March 20, 2024 08:52
@belimawr belimawr added the bug label Mar 22, 2024
@belimawr belimawr force-pushed the filestream-truncation-bug branch from 7c2a1f0 to 0978bba Compare March 22, 2024 13:45
@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 22, 2024

💔 Build Failed

Failed CI Steps

History

cc @belimawr

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @belimawr

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @belimawr

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @belimawr

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 22, 2024

💔 Build Failed

Failed CI Steps

History

cc @belimawr

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @belimawr

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @belimawr

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @belimawr

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @belimawr

@belimawr belimawr requested a review from rdner March 25, 2024 07:53
@@ -0,0 +1,225 @@
// Licensed to Elasticsearch B.V. under one or more contributor
Copy link
Member

@rdner rdner Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@belimawr Did you make sure all these tests are failing without the fix applied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wrote them before start working on the fix. But it has been a while since I ran them, so I'll re-run them without the fix before merging just to be on the safe side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--- FAIL: TestFilestreamLiveFileTruncation (8.91s)
    filestream_truncation_test.go:110: expecting offset 500 got 10500 instead

--- FAIL: TestFilestreamOfflineFileTruncation (4.28s)
    filestream_truncation_test.go:150: expecting offset 250 got 750 instead

Copy link
Member

@rdner rdner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given yes is the answer to my question above, everything looks good.

@belimawr belimawr force-pushed the filestream-truncation-bug branch from 0978bba to b77063d Compare March 26, 2024 07:51
@belimawr
Copy link
Contributor Author

I rebased onto main to fix the merge conflicts, force push.

@belimawr belimawr force-pushed the filestream-truncation-bug branch from b77063d to b6f96d8 Compare March 27, 2024 08:18
@belimawr belimawr requested a review from a team as a code owner March 27, 2024 08:31
@botelastic botelastic bot added the Team:Automation Label for the Observability productivity team label Mar 27, 2024
@belimawr belimawr force-pushed the filestream-truncation-bug branch from c5774dc to 41d1e18 Compare March 27, 2024 11:05
@belimawr
Copy link
Contributor Author

/test

@belimawr belimawr force-pushed the filestream-truncation-bug branch from 41d1e18 to 1744c7e Compare April 8, 2024 08:46
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a small question, otherwise LGTM from the ingest-eng-prod PoV

.ci/scripts/install-tools.bat Outdated Show resolved Hide resolved
@belimawr belimawr force-pushed the filestream-truncation-bug branch from 9794cd3 to c893107 Compare April 9, 2024 06:51
Copy link
Contributor

mergify bot commented Apr 11, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b filestream-truncation-bug upstream/filestream-truncation-bug
git merge upstream/main
git push upstream filestream-truncation-bug

belimawr added 13 commits April 11, 2024 10:10
Test that fails if Filestream does not correctly update the registry
when the file is truncated while Filebeat is not running.
Refactor tests and set `logging.files.rotateonstartup=false` when
starting the Beat so we only generate a single log file even when the
Beat is started multiple times.
This commit fixes file truncation handling for the case when the file
is truncated when Filebeat is not running.
The generated log file did not have its size predictable because the
RFC3339 may not include the timezone offset. This is fixed by looking
at the time string size and adding padding if necessary.
@belimawr belimawr force-pushed the filestream-truncation-bug branch from f18a0ad to bbc3532 Compare April 11, 2024 08:11
@belimawr belimawr merged commit 640574d into elastic:main Apr 12, 2024
207 of 211 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Automation Label for the Observability productivity team Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filestream sometimes does not update registry offset when file is truncated
4 participants