Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(file-server): while using vector with log rotation source, disk space is not freed #17603

Closed
wants to merge 1 commit into from

Conversation

maorbr89
Copy link

@maorbr89 maorbr89 commented Jun 5, 2023

I found out that Vector do not release file handlers even if the files are deleted.
Specifically, the use case is that vector is watching a file/folder that is being written to by a logger with log rotation.
This is easily reproducible on my local machine, and I can see that a lot of files are being deleted, but the disk spaces is not freed because vector have a file handle open for reading and does not release it.
Over time, this is filling up the disk and services start to crash.

vector_disk_usage
The command: $ lsof +L1 | grep /home/maor/Projects/logger_app/vector_logs/ will list all files in the directory that have been deleted but are still being held open by some process (vector in our case).

Assumption with python RotatingFilehandler logger is the following:

  1. Setup:
    a. Logger is writing file log.log to a folder, only allowed to rotate over 2 files.
    b. Vector is configured to read from this file only (log.log).
  2. Step 1 – there is only one file (log.log) and vector is reading from it. No problems.
  3. Step 2 – the logger decide to rotate log.log, so it rename it to log.log1 and create a new file: log.log.
    a. Vector identifies the new log.log file and start reading from it, but does not remove or change log.log1
  4. Step 3 – the logger decide to rotate again, so log.log1 is deleted. Log.log is renamed to log.log1 and a new log.log is created.
    a. Vector identifies and add the new log.log file, does not remove or change log.log1 (there are 2 of these now).

image

In summary, the bug occurred because, while using log rotation, two files shared the same path by the vector watcher, the fix should allow vector to release the watcher associated with the deleted file and begin tracking the new file that shares the same path.

To accomplish this, I add an additional check at the start of the second loop (where there is a scan through a map of file watchers in file-server.rs) to open the path again and compare the 'file_id' with the 'file_id' of the current watcher. If they do not match, the watcher is marked as 'dead,' and it will be removed from the 'fp_map' later, freeing up disk space.

Closes: #11742

… I add an additional check at the start of the second loop to open the path again and compare the 'file_id' with the 'file_id' of the current watcher. If they do not match, the watcher is marked as 'dead', and it will be removed from the 'fp_map' later, freeing up disk space.
@maorbr89 maorbr89 requested a review from spencergilbert as a code owner June 5, 2023 17:17
@maorbr89 maorbr89 requested a review from a team June 5, 2023 17:17
@bits-bot
Copy link

bits-bot commented Jun 5, 2023

CLA assistant check
All committers have signed the CLA.

@netlify
Copy link

netlify bot commented Jun 5, 2023

Deploy Preview for vector-project ready!

Name Link
🔨 Latest commit 138d22b
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/647e18c29701650009c14ac1
😎 Deploy Preview https://deploy-preview-17603--vector-project.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@netlify
Copy link

netlify bot commented Jun 5, 2023

Deploy Preview for vrl-playground ready!

Name Link
🔨 Latest commit 138d22b
🔍 Latest deploy log https://app.netlify.com/sites/vrl-playground/deploys/647e18c22f3ffb0008c29fbc
😎 Deploy Preview https://deploy-preview-17603--vrl-playground.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@jszwedko jszwedko requested a review from lukesteensen June 5, 2023 19:31
@maorbr89 maorbr89 changed the title while using vector with log rotation source, disk space is not freed,… fix(file-server): while using vector with log rotation source, disk space is not freed,… Jun 6, 2023
@maorbr89 maorbr89 changed the title fix(file-server): while using vector with log rotation source, disk space is not freed,… fix(file-server): while using vector with log rotation source, disk space is not freed. Jun 6, 2023
@maorbr89 maorbr89 changed the title fix(file-server): while using vector with log rotation source, disk space is not freed. fix(file-server): while using vector with log rotation source, disk space is not freed Jun 6, 2023
@lukesteensen
Copy link
Member

Hi @maorbr, and thanks for digging into this! It's quite a complex part of Vector so we appreciate the effort.

It's not entirely clear to me why you're seeing that behavior in the first place, which I'd like to sort out before we agree on a fix. I'll walk through your example scenario and explain what should be happening:

  1. Setup:
    a. Logger is writing file log.log to a folder, only allowed to rotate over 2 files.
    b. Vector is configured to read from this file only (log.log).
  2. Step 1 – there is only one file (log.log) and vector is reading from it. No problems.

Seems like we're good to this point. If log.log has a fingerprint of abc, the fp_map will have one entry:

abc => Watcher { path: log.log, findable: true }
  1. Step 2 – the logger decide to rotate log.log, so it rename it to log.log1 and create a new file: log.log.
    a. Vector identifies the new log.log file and start reading from it, but does not remove or change log.log1

When we scan for file updates, the first thing we do is mark every entry in the fp_map as not findable:

for (_file_id, watcher) in &mut fp_map {
watcher.set_file_findable(false); // assume not findable until found
}

Immediately afterwards, we iterate through the paths of files we're configured to watch (which will only be log.log in this case, since we are not configured to watch log.log1). For each entry, we check the fingerprint and see if it matches anything in fp_map. For the new log.log it will not, because it is all new data (let's say new fingerprint is def). This means we will add it as a new entry, resulting in the following fp_map:

abc => Watcher { path: log.log, findable: false }
def => Watcher { path: log.log, findable: true }

You are correct that we do not remove the old entry as part of this process, but we do change it by setting findable to false.

Next we iterate through all entries in fp_map and call read_line on the watcher:

for (&file_id, watcher) in &mut fp_map {
if !watcher.should_read() {
continue;
}
let start = time::Instant::now();
let mut bytes_read: usize = 0;
while let Ok(Some(line)) = watcher.read_line() {

Within read_line, we detect whether we have reached the end of the file, and if so we check if the file is findable and set dead if it is not:

Ok(None) => {
if !self.file_findable() {
self.set_dead();

Once the watcher is set to dead, we drop it from fp_map at the bottom of the main loop, which in turn releases the file handle.


To summarize, the intended behavior is that we release the file handle once two conditions are met:

  1. We have read to the end of the file
  2. We did not find any file with the same fingerprint (different from path!) when we last scanned for files

That should be perfectly compatible with your scenario, so I would really like to understand why you're seeing different behavior.

Your fix seems to take a different approach, where it detects if a file has been renamed, and marks it dead if so. This has a few potential issues:

  1. There are cases where we want to be able to continue reading from a file after it has been renamed
  2. There is no check for whether or not we've finished reading the existing contents of the file
  3. Doing this check in this location (within the read loop) is likely to have a significant performance impact

These are all things we can discuss and address, but I think the priority right now should be finding where the existing implementation is failing to behave as we expect.

@jszwedko jszwedko added the meta: awaiting author Pull requests that are awaiting their author. label Jun 26, 2023
@dsmith3197
Copy link
Contributor

We're going to close this for now. Feel free to re-open to pick back up the discussion.

@dsmith3197 dsmith3197 closed this Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: awaiting author Pull requests that are awaiting their author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vector keeps old Inodes, clogging up disk
5 participants