Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat reading deleted files, caused FS 100% used #2011

Closed
adrianlop opened this issue Jul 12, 2016 · 34 comments
Closed

Filebeat reading deleted files, caused FS 100% used #2011

adrianlop opened this issue Jul 12, 2016 · 34 comments
Labels

Comments

@adrianlop
Copy link

  • Version: Filebeat 1.2.3
  • Operating System: Ubuntu Server 14.04 LTS
  • Steps to Reproduce:

Filebeat didn't close deleted files, so the operating system couldn't free up FS space:

root@kafka03:/var/log# lsof -n | grep -i deleted
[...]
filebeat   1495 31393       root  876r      REG              253,1   52429440     138077 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  877r      REG              253,1   52429440     138076 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  878r      REG              253,1   52429440     138079 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  879r      REG              253,1   52429440     138074 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  880r      REG              253,1   52429440     138081 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  881r      REG              253,1   52429440     138080 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  882r      REG              253,1   52429440     138084 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  883r      REG              253,1   52429440     138083 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  884r      REG              253,1   52429440     138082 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  885r      REG              253,1   52429440     138085 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  886r      REG              253,1   52429440     138086 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  887r      REG              253,1   52429440     138087 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  888r      REG              253,1   52429440     138090 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  889r      REG              253,1   52429440     138092 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  890r      REG              253,1   52429440     138088 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  891r      REG              253,1   52429440     138091 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  892r      REG              253,1   52429440     138089 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  893r      REG              253,1   52429440     138093 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  894r      REG              253,1   52429440     138097 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  895r      REG              253,1   52429440     138094 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  896r      REG              253,1   52429440     138096 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  897r      REG              253,1   52429440     138100 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  898r      REG              253,1   52429440     138098 /var/log/kafka/server.log.4 (deleted)
filebeat   1495 31393       root  900r      REG              253,1   52429440     138099 /var/log/kafka/server.log.4 (deleted)
[...]


root@kafka03:/var/kafka# df -h
Filesystem                          Size  Used Avail Use% Mounted on
/dev/disk/by-label/cloudimg-rootfs   60G   57G  137M 100% /

After restarting filebeat, OS could finally delete files:

/dev/disk/by-label/cloudimg-rootfs   60G   13G   44G  23% /

I don't really know why Filebeat had this file opened (server.log.4), because I only configured this path (/var/log/kafka/server.log) in filebeat.yml config file:

    -
      paths:
        - /var/log/kafka/server.log
      input_type: log
      document_type: kafka
      ignore_older: 24h
      exclude_lines: ["^[[:space:]]*$"]
      multiline:
        pattern: '(^[a-zA-Z.]+(Error|Exception): .+)|(^java\.lang.+)|(^[ \t]+at .+)|(^[ \t]+... [0-9]+)|(^[ \t]*Caused by:.+)'
        match: after

Can you take a look at this, please?

Thanks!

@ruflin
Copy link
Member

ruflin commented Jul 12, 2016

Filebeat keeps the file open until close_older is reached. Can you share some details on the time period this happens?

@adrianlop
Copy link
Author

hi @ruflin, I don't have close_older configured, but I have ignore_older set to 24h.
also, why is it opening server.log.4 if I configured just server.log?

thanks for the quick response!

@ruflin ruflin added the Filebeat Filebeat label Jul 13, 2016
@ruflin
Copy link
Member

ruflin commented Jul 13, 2016

Filebeat keeps the file open also if it is rotated or renamed, as the file handler is not "attached" to a file name. So it can happen that in your case when you check it, the file actually had a different name.

During doing some more refactoring in 5.0 we found a potential issue where we leak file descriptors: #2020 (comment) This could explain your issue. I will ping you when we have this one merged in for potential testing.

@ruflin ruflin added the bug label Jul 13, 2016
@ruflin
Copy link
Member

ruflin commented Jul 13, 2016

Here is the PR for potentially fixing this issue: #2029

@adrianlop
Copy link
Author

great, thanks @ruflin! that was fast 👍

@ruflin
Copy link
Member

ruflin commented Jul 17, 2016

@adrianlop Any chance that you can test the nightly build to see if the problem disappears? https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/

@adrianlop
Copy link
Author

@ruflin sure! I just installed alpha5 in the machines where I had problems. I'll write you back in a couple of days!

@tsg
Copy link
Contributor

tsg commented Aug 2, 2016

@adrianlop hi, any news on this one? Does it look good? Thanks!

@adrianlop
Copy link
Author

wow guys, I forgot posting back, sorry.
Looks good, yes!

thank you

@richard087
Copy link

This also applies to v5.0.0-alpha5, I think.

@ruflin
Copy link
Member

ruflin commented Aug 18, 2016

@richard087 Do you see in alpha5 filebeat not releasing files even after close_older? Can you share some more details?

@richard087
Copy link

I haven't explicitly set 'close_older'; but the files are kept indefinitely - certainly more than the default value of an hour.

@ruflin
Copy link
Member

ruflin commented Aug 22, 2016

@richard087 Could you share some more details? OS? Full config? Log rotation mechanism you use?

@richard087
Copy link

Hi, sorry for the slow reply. I've moved off the site where this is an issue. As best I recall it, the OS is Amazon Linux 2016.3, and the log rotation mechanism is Java8's java.util.logging.

@ruflin
Copy link
Member

ruflin commented Aug 29, 2016

@richard087 Thanks for getting back. To better catch these issues we started to run some long term load tests. There are some potential race conditions that can cause these issues in 1.x. So far we couldn't detect these issues in the most recent 5.0 builds. For the setup: It doesn't sound like anything special here, so this should not be an issue. One thing that can cause open files is #2395 (comment) Could that have been the issue?

It would be great if you could ping me here directly again in case you see this issue happen again, especially with the most recent filebeat builds.

@azkie
Copy link

azkie commented Aug 30, 2016

Hello! I have been experiencing this same issue for a while now. I started out with logstash-forwarder, then moved to filebeat 1.2.3 hoping the issue would be resolved. Tried config option 'force_close_files', which didn't solve the problem. I then upgraded to filebeat 5.0.0-alpha5 with config options 'close_renamed' and 'close_removed' set to true. Issue still remains.

# lsof | grep deleted
filebeat  18112     root    6r      REG                9,2 14388205827   10544073 /news/gs/OLD/ar.file.20160826-2350 (deleted)
filebeat  18112     root    8r      REG                9,2 14949991522   10305642 /news/gs/OLD/ar.file.20160827-2350 (deleted)
filebeat  18112     root    9r      REG                9,2 15359022734   10544177 /news/gs/OLD/ar.file.20160828-2350 (deleted)
filebeat  18112     root   10r      REG                9,2 14814271576   10543674 /news/gs/OLD/ar.file.20160829-2350 (deleted)

My config:

filebeat:
  prospectors:
    -
      paths:
        - /news/gs/ar.file
      document_type: ar-gs
      close_renamed: true
      close_removed: true

The file gets rotated by a custom script that moves (mv) the file to the 'OLD' folder, then touches the original filename again. OS version:

# cat /etc/gentoo-release
Gentoo Base System release 1.12.11.1
# uname -r
2.6.30-gentoo-r5

@ruflin
Copy link
Member

ruflin commented Aug 30, 2016

@azkie Strange to see this issue in alpha5. Did filebeat catch up sending all log lines to elasticsearch or logstash or could it be that there is some back pressure which causes the file to stay open? Thanks for testing the 5.0 releases.

@azkie
Copy link

azkie commented Aug 31, 2016

@ruflin When looking in Kibana it seems that the log shipping is up to date. The latest entries I see are from the newest log file. Is there a way to verify filebeat is not still processing some entries from older files? Although I can't imagine it being behind with sending log entries from files that are days old.

filebeat  18112     root    6r      REG                9,2 14388205827   10544073 /news/gs/OLD/ar.file.20160826-2350 (deleted)
filebeat  18112     root    8r      REG                9,2 14949991522   10305642 /news/gs/OLD/ar.file.20160827-2350 (deleted)
filebeat  18112     root    9r      REG                9,2 15359022734   10544177 /news/gs/OLD/ar.file.20160828-2350 (deleted)
filebeat  18112     root   10r      REG                9,2 14814271576   10543674 /news/gs/OLD/ar.file.20160829-2350 (deleted)
filebeat  18112     root   11r      REG                9,2 14640253776   10551860 /news/gs/OLD/ar.file.20160830-2350 (deleted)

Files just don't seem to be freed ever. Since these logs are quite large (2.5GB) the disk fills up relatively quickly, so I restart filebeat every few days. I feel like I have tried most config options and different versions and kind of exhausted my options.

@ruflin
Copy link
Member

ruflin commented Aug 31, 2016

Could you share some log files with at least info level from the 5.0.0 run? Perhaps this gives us some more info on what is happening. Can you try the most recent nightly build as there the logging gives even some more information as every 30s some stats are printed out: https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/

@azkie
Copy link

azkie commented Sep 1, 2016

Below is a snippet from the filebeat log (version 5.0.0-alpha5) around the time the log rotates.

2016-08-31T23:45:06Z INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_bytes=1537 libbeat.publisher.published_events=10240 libbeat.logstash.published_and_acked_events=10240 libbeat.logstash.publish.write_bytes=746117 registrar.state_updates=70 publish.events=10240 libbeat.logstash.call_count.PublishEvents=5
2016-08-31T23:45:36Z INFO Non-zero metrics in the last 30s: publish.events=10240 libbeat.logstash.publish.read_bytes=1537 libbeat.logstash.call_count.PublishEvents=5 libbeat.logstash.publish.write_bytes=748336 libbeat.publisher.published_events=10240 libbeat.logstash.published_and_acked_events=10240 registrar.state_updates=70
2016-08-31T23:46:06Z INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=6144 libbeat.logstash.call_count.PublishEvents=3 registrar.state_updates=42 libbeat.logstash.publish.write_bytes=434423 libbeat.logstash.publish.read_bytes=1113 publish.events=6144 libbeat.logstash.published_and_acked_events=6144
2016-08-31T23:46:36Z INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=10240 libbeat.logstash.publish.read_bytes=1537 libbeat.logstash.call_count.PublishEvents=5 registrar.state_updates=70 publish.events=10240 libbeat.logstash.publish.write_bytes=708833 libbeat.logstash.published_and_acked_events=10240
2016-08-31T23:47:06Z INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_bytes=1537 libbeat.logstash.publish.write_bytes=721574 libbeat.logstash.call_count.PublishEvents=5 libbeat.publisher.published_events=10240 libbeat.logstash.published_and_acked_events=10240 publish.events=10240 registrar.state_updates=70
2016-08-31T23:47:36Z INFO Non-zero metrics in the last 30s: libbeat.logstash.published_and_acked_events=10240 libbeat.logstash.publish.write_bytes=709430 registrar.state_updates=70 libbeat.publisher.published_events=10240 libbeat.logstash.publish.read_bytes=1696 libbeat.logstash.call_count.PublishEvents=5 publish.events=10240
2016-08-31T23:48:06Z INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=5 registrar.state_updates=70 libbeat.logstash.published_and_acked_events=10240 libbeat.publisher.published_events=10240 libbeat.logstash.publish.write_bytes=723606 libbeat.logstash.publish.read_bytes=1537 publish.events=10240
2016-08-31T23:48:36Z INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=10240 libbeat.logstash.call_count.PublishEvents=5 libbeat.logstash.published_and_acked_events=10240 publish.events=10240 libbeat.logstash.publish.read_bytes=1431 libbeat.logstash.publish.write_bytes=738315 registrar.state_updates=70
2016-08-31T23:49:06Z INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=5 registrar.state_updates=70 libbeat.logstash.publish.read_bytes=1590 libbeat.publisher.published_events=10240 libbeat.logstash.published_and_acked_events=10240 libbeat.logstash.publish.write_bytes=723163 publish.events=10240
2016-08-31T23:49:36Z INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=10240 libbeat.logstash.publish.read_bytes=1643 publish.events=10240 registrar.state_updates=70 libbeat.logstash.published_and_acked_events=10240 libbeat.logstash.publish.write_bytes=716849 libbeat.logstash.call_count.PublishEvents=5
2016-08-31T23:50:04Z INFO Harvester started for file: /news/gs/ar.file
2016-08-31T23:50:06Z INFO Non-zero metrics in the last 30s: filebeat.harvester.started=1 filebeat.harvester.running=1 libbeat.publisher.published_events=10240 libbeat.logstash.publish.read_bytes=1537 publish.events=10240 registrar.state_updates=70 libbeat.logstash.call_count.PublishEvents=5 libbeat.logstash.publish.write_bytes=719169 libbeat.logstash.published_and_acked_events=10240 filebeat.harvester.open_files=1
2016-08-31T23:50:36Z INFO Non-zero metrics in the last 30s: libbeat.logstash.published_and_acked_events=6143 libbeat.logstash.publish.read_bytes=1166 libbeat.logstash.call_count.PublishEvents=3 libbeat.logstash.publish.write_bytes=427831 libbeat.publisher.published_events=6143 publish.events=6144 registar.states.total=1 registrar.state_updates=43
2016-08-31T23:51:06Z INFO Non-zero metrics in the last 30s: libbeat.logstash.published_and_acked_events=10240 libbeat.logstash.publish.read_bytes=1590 libbeat.publisher.published_events=10240 publish.events=10240 libbeat.logstash.publish.write_bytes=721505 libbeat.logstash.call_count.PublishEvents=5 registrar.state_updates=75
2016-08-31T23:51:36Z INFO Non-zero metrics in the last 30s: registrar.state_updates=90 libbeat.logstash.published_and_acked_events=12288 publish.events=12288 libbeat.logstash.publish.write_bytes=878344 libbeat.logstash.call_count.PublishEvents=6 libbeat.logstash.publish.read_bytes=1696 libbeat.publisher.published_events=12288
2016-08-31T23:52:06Z INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.write_bytes=576729 libbeat.logstash.publish.read_bytes=1484 publish.events=8192 libbeat.publisher.published_events=8192 libbeat.logstash.published_and_acked_events=8192 registrar.state_updates=60 libbeat.logstash.call_count.PublishEvents=4
2016-08-31T23:52:36Z INFO Non-zero metrics in the last 30s: libbeat.logstash.published_and_acked_events=12288 publish.events=12288 libbeat.logstash.publish.write_bytes=886386 libbeat.publisher.published_events=12288 registrar.state_updates=90 libbeat.logstash.call_count.PublishEvents=6 libbeat.logstash.publish.read_bytes=1749
2016-08-31T23:53:06Z INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_bytes=1696 publish.events=10240 libbeat.logstash.call_count.PublishEvents=5 libbeat.publisher.published_events=10240 registrar.state_updates=75 libbeat.logstash.published_and_acked_events=10240 libbeat.logstash.publish.write_bytes=733009
2016-08-31T23:53:36Z INFO Non-zero metrics in the last 30s: publish.events=10240 libbeat.logstash.published_and_acked_events=10240 libbeat.publisher.published_events=10240 libbeat.logstash.publish.read_bytes=1431 libbeat.logstash.publish.write_bytes=724987 registrar.state_updates=75 libbeat.logstash.call_count.PublishEvents=5
2016-08-31T23:54:06Z INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_bytes=1431 libbeat.publisher.published_events=8192 libbeat.logstash.published_and_acked_events=8192 libbeat.logstash.call_count.PublishEvents=4 libbeat.logstash.publish.write_bytes=583801 publish.events=8192 registrar.state_updates=60
2016-08-31T23:54:36Z INFO Non-zero metrics in the last 30s: publish.events=10240 libbeat.logstash.publish.read_bytes=1590 registrar.state_updates=75 libbeat.logstash.call_count.PublishEvents=5 libbeat.publisher.published_events=10240 libbeat.logstash.publish.write_bytes=753189 libbeat.logstash.published_and_acked_events=10240

I will install the latest night build (5.0.0-alpha6) and update this thread with more details shortly.

@ruflin
Copy link
Member

ruflin commented Sep 2, 2016

Thanks for the logs. So far I didn't spot anything special. I assume by alpha6 you mean the Snapshot build? If we can't find it in the logs, we probably have to enabled the debug log level. Unfortunately this will produce much more logs.

BTW: Some good news from the logs above is that it does not seem like filebeat fails to ack some events which would cause back pressure.

@azkie
Copy link

azkie commented Sep 6, 2016

I turned on debug log level but now I noticed the logs rotated fairly quickly and aren't kept for long. Is there a config option to keep them for longer?

-rw-r--r-- 1 root root 6.9M Sep  6 12:34 filebeat
-rw-r--r-- 1 root root  11M Sep  6 12:34 filebeat.1
-rw-r--r-- 1 root root  10M Sep  6 12:33 filebeat.2
-rw-r--r-- 1 root root  10M Sep  6 12:32 filebeat.3
-rw-r--r-- 1 root root  11M Sep  6 12:31 filebeat.4
-rw-r--r-- 1 root root  10M Sep  6 12:30 filebeat.5
-rw-r--r-- 1 root root  11M Sep  6 12:29 filebeat.6

@ruflin
Copy link
Member

ruflin commented Sep 8, 2016

@azkie You can change the number of files which should be rotated and the size of the files. See here: https://www.elastic.co/guide/en/beats/filebeat/master/configuration-logging.html#_files_rotateeverybytes

@azkie
Copy link

azkie commented Sep 9, 2016

Below is a snippet from the logfile on debug level at the time the file that is read from is rotated. I've filtered out the lines that contain published events. Please let me know if you need more details, then I can attach the entire log.

2016-09-08T23:50:06Z DBG  Run prospector
2016-09-08T23:50:06Z DBG  Start next scan
2016-09-08T23:50:06Z DBG  Check file for harvesting: /news/gs/ar.file
2016-09-08T23:50:06Z DBG  Update existing file for harvesting: /news/gs/ar.file, offset: 6735594869
2016-09-08T23:50:06Z DBG  Old file was truncated. Starting from the beginning: /news/gs/ar.file
2016-09-08T23:50:06Z DBG  Setting offset for file based on seek: /news/gs/ar.file
2016-09-08T23:50:06Z DBG  Setting offset for file: /news/gs/ar.file. Offset: 0
2016-09-08T23:50:06Z INFO Harvester started for file: /news/gs/ar.file
2016-09-08T23:50:06Z DBG  Update state: /news/gs/ar.file, offset: 0
2016-09-08T23:50:07Z DBG  try read: 6
2016-09-08T23:50:08Z DBG  try read: 6
2016-09-08T23:50:09Z DBG  try read: 6
2016-09-08T23:50:09Z DBG  2048 events out of 2048 events sent to logstash. Continue sending
2016-09-08T23:50:09Z DBG  send completed
2016-09-08T23:50:09Z DBG  Events sent: 2048
2016-09-08T23:50:09Z DBG  Processing 2048 events
2016-09-08T23:50:09Z DBG  Registrar states cleaned up. Before: 8 , After: 8
2016-09-08T23:50:09Z DBG  Write registry file: /root/filebeat-5.0.0-alpha6-SNAPSHOT-linux-x86_64/data/registry
2016-09-08T23:50:09Z DBG  Registry file updated. 8 states written.
2016-09-08T23:50:09Z DBG  Flushing spooler because spooler full. Events flushed: 2048
2016-09-08T23:50:09Z DBG  output worker: publish 2048 events
2016-09-08T23:50:09Z DBG  Try to publish 2048 events to logstash with window size 2048
2016-09-08T23:50:09Z DBG  try read: 6
2016-09-08T23:50:11Z DBG  try read: 6
2016-09-08T23:50:13Z DBG  try read: 6
2016-09-08T23:50:14Z DBG  try read: 6
2016-09-08T23:50:15Z DBG  try read: 6
2016-09-08T23:50:16Z DBG  try read: 6
2016-09-08T23:50:16Z DBG  2048 events out of 2048 events sent to logstash. Continue sending
2016-09-08T23:50:16Z DBG  send completed

@ruflin
Copy link
Member

ruflin commented Sep 12, 2016

@azkie Based on the following log message it seems that your log files get truncated and not necessarly rotated:

2016-09-08T23:50:06Z DBG  Old file was truncated. Starting from the beginning: /news/gs/ar.file

You write above:

The file gets rotated by a custom script that moves (mv) the file to the 'OLD' folder, then touches the original filename again. OS version:

Can you perhaps share some more details on what this exactly does and the "touches the original filename again" ?

@azkie
Copy link

azkie commented Sep 12, 2016

@ruflin This is how the file actually gets 'rotated':

for FILE in ${FILE_LIST}
do
  if [ -f ${LOGDIR}/${FILE} ]
  then
    mv ${LOGDIR}/${FILE} ${OLDDIR}/${FILE}.${EXTENSION}
    touch ${LOGDIR}/${FILE}
    chown news ${LOGDIR}/${FILE}
    chmod 0644 ${LOGDIR}/${FILE}
  fi
done

@ruflin
Copy link
Member

ruflin commented Sep 12, 2016

How are files removed? The truncation message could also indicate that the inode is reused (which requires deletion of an old file).

@azkie
Copy link

azkie commented Sep 12, 2016

@ruflin Let me show the rest of the code. The files actually get moved, then compressed. Then after 14 days the files are removed.

# move files
for FILE in ${FILE_LIST}
do
  if [ -f ${LOGDIR}/${FILE} ]
  then
    mv ${LOGDIR}/${FILE} ${OLDDIR}/${FILE}.${EXTENSION}
    touch ${LOGDIR}/${FILE}
    chown news ${LOGDIR}/${FILE}
    chmod 0644 ${LOGDIR}/${FILE}
  fi
done

# compress moved files
for FILE in ${FILE_LIST}
do
  if [ -f ${OLDDIR}/${FILE}.${EXTENSION} ]
  then
    chown news:news ${OLDDIR}/${FILE}.${EXTENSION}
    chmod 0644 ${OLDDIR}/${FILE}.${EXTENSION}
    nice -19 ${COMPRESS} ${OLDDIR}/${FILE}.${EXTENSION}
    sleep 1
  fi
done

# delete archives older than 14 days
if [ -d ${OLDDIR} ]
then
  NUM_FILES=`find ${OLDDIR} -type f -mtime +14 -print | wc -l`
  if [ ${NUM_FILES} -gt 0 ]
  then
    find ${OLDDIR} -type f -mtime +14 -print -delete
  fi
fi

@ruflin
Copy link
Member

ruflin commented Sep 13, 2016

Do you see the issue already quite soon after starting filebeat or "only" after 14 days and more?

@azkie
Copy link

azkie commented Sep 13, 2016

The issue starts after the first log rotation. Every time (every 24 hours) the log gets rotated, I see the moved log file still open using lsof. This would be 4 days after starting filebeat:

# lsof | grep deleted
filebeat  18112     root    6r      REG                9,2 14388205827   10544073 /news/gs/OLD/ar.file.20160826-2350 (deleted)
filebeat  18112     root    8r      REG                9,2 14949991522   10305642 /news/gs/OLD/ar.file.20160827-2350 (deleted)
filebeat  18112     root    9r      REG                9,2 15359022734   10544177 /news/gs/OLD/ar.file.20160828-2350 (deleted)
filebeat  18112     root   10r      REG                9,2 14814271576   10543674 /news/gs/OLD/ar.file.20160829-2350 (deleted)

@ruflin
Copy link
Member

ruflin commented Sep 13, 2016

I'm currently trying to reproduce this with running filebeat over longer periods and do some more intense file rotations. So far no success. In the first days, do you ever see the truncation message in your logs? Can you share again your config file that you now use with filebeat-5.0.0-alpha6? Before restarting do you clean up the registry file or do you keep the same one? Thanks for providing all the information so we can get to the ground of this.

@azkie
Copy link

azkie commented Sep 13, 2016

The truncation message shows up every time the file rotates.

Full config file below. Filebeat is started as './filebeat -v -c fb.yml'

filebeat:
  prospectors:
    -
      paths:
        - /news/gs/ar.file
      document_type: ar-gs
      close_renamed: true
      close_removed: true
#      force_close_files: true
#      fields:
#        type: ar-gs
#      fields_under_root: true
#      tail_files: true
output:
  logstash:
    hosts:
      - xxx:5001
    timeout: 15
    tls:
      certificate_authorities:
        - /etc/ssl/certs/logstash-forwarder.crt

I do not touch the registry file at all. Filebeat is running in my terminal inside a screen. I restart by pressing Ctrl-C (SIGINT) and running the filebeat binary again. At that point the open files I saw using lsof are gone. Thank you for investigating the issue.

@ruflin
Copy link
Member

ruflin commented Sep 13, 2016

The problem here is, that if you don't remove the registry file, you still have a registry file with all the data inside from x days ago. This makes it hard to reproduce. The problem with removing the registry file is, that it will reread all the data.

@ruflin
Copy link
Member

ruflin commented Sep 14, 2016

@azkie As you seem to have a rather low number of files, this should not apply in your case except if output is down or very slow. But worth mentioning: #2541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants