-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat reading deleted files, caused FS 100% used #2011
Comments
Filebeat keeps the file open until |
hi @ruflin, I don't have thanks for the quick response! |
Filebeat keeps the file open also if it is rotated or renamed, as the file handler is not "attached" to a file name. So it can happen that in your case when you check it, the file actually had a different name. During doing some more refactoring in 5.0 we found a potential issue where we leak file descriptors: #2020 (comment) This could explain your issue. I will ping you when we have this one merged in for potential testing. |
Here is the PR for potentially fixing this issue: #2029 |
great, thanks @ruflin! that was fast 👍 |
@adrianlop Any chance that you can test the nightly build to see if the problem disappears? https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/ |
@ruflin sure! I just installed alpha5 in the machines where I had problems. I'll write you back in a couple of days! |
@adrianlop hi, any news on this one? Does it look good? Thanks! |
wow guys, I forgot posting back, sorry. thank you |
This also applies to v5.0.0-alpha5, I think. |
@richard087 Do you see in alpha5 filebeat not releasing files even after close_older? Can you share some more details? |
I haven't explicitly set 'close_older'; but the files are kept indefinitely - certainly more than the default value of an hour. |
@richard087 Could you share some more details? OS? Full config? Log rotation mechanism you use? |
Hi, sorry for the slow reply. I've moved off the site where this is an issue. As best I recall it, the OS is Amazon Linux 2016.3, and the log rotation mechanism is Java8's java.util.logging. |
@richard087 Thanks for getting back. To better catch these issues we started to run some long term load tests. There are some potential race conditions that can cause these issues in 1.x. So far we couldn't detect these issues in the most recent 5.0 builds. For the setup: It doesn't sound like anything special here, so this should not be an issue. One thing that can cause open files is #2395 (comment) Could that have been the issue? It would be great if you could ping me here directly again in case you see this issue happen again, especially with the most recent filebeat builds. |
Hello! I have been experiencing this same issue for a while now. I started out with logstash-forwarder, then moved to filebeat 1.2.3 hoping the issue would be resolved. Tried config option 'force_close_files', which didn't solve the problem. I then upgraded to filebeat 5.0.0-alpha5 with config options 'close_renamed' and 'close_removed' set to true. Issue still remains.
My config:
The file gets rotated by a custom script that moves (mv) the file to the 'OLD' folder, then touches the original filename again. OS version:
|
@azkie Strange to see this issue in alpha5. Did filebeat catch up sending all log lines to elasticsearch or logstash or could it be that there is some back pressure which causes the file to stay open? Thanks for testing the 5.0 releases. |
@ruflin When looking in Kibana it seems that the log shipping is up to date. The latest entries I see are from the newest log file. Is there a way to verify filebeat is not still processing some entries from older files? Although I can't imagine it being behind with sending log entries from files that are days old.
Files just don't seem to be freed ever. Since these logs are quite large (2.5GB) the disk fills up relatively quickly, so I restart filebeat every few days. I feel like I have tried most config options and different versions and kind of exhausted my options. |
Could you share some log files with at least info level from the 5.0.0 run? Perhaps this gives us some more info on what is happening. Can you try the most recent nightly build as there the logging gives even some more information as every 30s some stats are printed out: https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/ |
Below is a snippet from the filebeat log (version 5.0.0-alpha5) around the time the log rotates.
I will install the latest night build (5.0.0-alpha6) and update this thread with more details shortly. |
Thanks for the logs. So far I didn't spot anything special. I assume by alpha6 you mean the Snapshot build? If we can't find it in the logs, we probably have to enabled the debug log level. Unfortunately this will produce much more logs. BTW: Some good news from the logs above is that it does not seem like filebeat fails to ack some events which would cause back pressure. |
I turned on debug log level but now I noticed the logs rotated fairly quickly and aren't kept for long. Is there a config option to keep them for longer?
|
@azkie You can change the number of files which should be rotated and the size of the files. See here: https://www.elastic.co/guide/en/beats/filebeat/master/configuration-logging.html#_files_rotateeverybytes |
Below is a snippet from the logfile on debug level at the time the file that is read from is rotated. I've filtered out the lines that contain published events. Please let me know if you need more details, then I can attach the entire log.
|
@azkie Based on the following log message it seems that your log files get truncated and not necessarly rotated:
You write above:
Can you perhaps share some more details on what this exactly does and the "touches the original filename again" ? |
@ruflin This is how the file actually gets 'rotated':
|
How are files removed? The truncation message could also indicate that the inode is reused (which requires deletion of an old file). |
@ruflin Let me show the rest of the code. The files actually get moved, then compressed. Then after 14 days the files are removed.
|
Do you see the issue already quite soon after starting filebeat or "only" after 14 days and more? |
The issue starts after the first log rotation. Every time (every 24 hours) the log gets rotated, I see the moved log file still open using lsof. This would be 4 days after starting filebeat:
|
I'm currently trying to reproduce this with running filebeat over longer periods and do some more intense file rotations. So far no success. In the first days, do you ever see the truncation message in your logs? Can you share again your config file that you now use with filebeat-5.0.0-alpha6? Before restarting do you clean up the registry file or do you keep the same one? Thanks for providing all the information so we can get to the ground of this. |
The truncation message shows up every time the file rotates. Full config file below. Filebeat is started as './filebeat -v -c fb.yml'
I do not touch the registry file at all. Filebeat is running in my terminal inside a screen. I restart by pressing Ctrl-C (SIGINT) and running the filebeat binary again. At that point the open files I saw using lsof are gone. Thank you for investigating the issue. |
The problem here is, that if you don't remove the registry file, you still have a registry file with all the data inside from x days ago. This makes it hard to reproduce. The problem with removing the registry file is, that it will reread all the data. |
Filebeat didn't close deleted files, so the operating system couldn't free up FS space:
After restarting filebeat, OS could finally delete files:
I don't really know why Filebeat had this file opened (server.log.4), because I only configured this path (/var/log/kafka/server.log) in filebeat.yml config file:
Can you take a look at this, please?
Thanks!
The text was updated successfully, but these errors were encountered: