-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rotated log files issue? #7
Comments
Hi again I've not tested with symlink but will see if I can reproduce this. When you say "dead file" do you mean "dead time"? |
Yes, "dead time". Lack of caffeine :) 2014-06-29 11:18 GMT+02:00 driskell [email protected]:
Fabio |
In the log-courier output, do you see "Registrar received X events"? This confirms that events are getting sent to the server side. If events aren't getting sent/received it will keep the file open as it's not sent the contents yet. |
The message you refer to is present in logs and I can confirm that events are sent. I've yet to look at the code (not knowing nothing about go it will take some time :) ) but could be possible that the file gets opened with the real name, as can be seen by lsof, and the age check is performed against the symlink? this can explain the behaviour... |
The age check isn't performed against any property of the file. It actually just watches the file and if it doesn't change in X hours, it stops harvesting. If it then subsequently changes it will start again. So unless those old log files are still receiving new entries, it should stop on those. And you'll see a "Stopping harvest of %s; last change was %v ago" message in the logs. Let me have a think on this to see if I can think of a reason it would do it. If you can find out when it does close the files (you say its after 20 hours) that would be useful... if it's 24hours then it may be the config didn't apply correctly (24h is the default) - in which case can you post your config? Thanks for your patience. |
Sure... I made some works yesterday, so I'm waiting for 24h to pass. I'll send updates this evening. Thanks for your cooperation! |
Ok, I have some more data. "files": [ so something does not sounds ok anyway :) |
When it says registrar received events does it say 1024? Could you get the size of the files in bytes and look inside the .log-courier file? (It gets saved in the current directory and contains latest offsets if files - it will confirm if we've read to the end or not.) Thanks. |
Yes, it says 1024 (mostly, sometimes it says 1 or 2, but immediately afterwards says again 1024). Looking at the inode, I got a bit puzzled. It seems that the inode bounces back and forth between two different inodes: one is the inode of the file pointed by the log_current symlink, where the other inode is from an older file. In either cases the offset is not corresponding to the file size... but the logs are flowing to logstash in a timely fashion, at least looking to elasticsearch latest entries. Definitely not what I expected :) |
Ohhh that's a bug then. It's saving two different files under the same name. I fixed that with normal rotations but possibly symlink introduces an interesting difference that isn't handled properly. It should stop saving the offset of the old file once it goes "out of scope" and only save that for the new file. Then to prevent loss you would need to keep all files "in scope" of course. It explains the problem though - it's not read to end of those files yet so can't close them. Possibly logstash cannot keep up with the flow. This explains the spurts of 1024 then much less as logstash buffer fills. Would you agree? I'll look into the overlapping save though that's not supposed to happen! |
I just checked and all files are still open, even the ones older than 24h so this seems to confirm your diagnosis. |
I've worked out the overlapping save where offsets flicks. I will push a fix for that later tonight hopefully so you can test. However, files will still remain open - so it won't fix the actual problem for you. What you ideally need is a broker / cache for logs between the shippers and LogStash. This way instead of the shipper holding logs because it needs them - the broker takes all the logs really quick and it keeps them instead. Logstash then feeds from the broker. If LogStash still can't keep up with the number of logs coming in though - this will just mean you fill the broker cache and again, logs stay open on the shipper. But with a broker you can easily add more Logstash instances to boost processing. Just set it up on another machine to feed from the same broker. I use redis and have a logstash receiver instance that simply inputs, then outputs to redis. Then I hook up three or four other logstash indexer instances that input from redis and do filtering then output to elastic search. To make sure you get max throughput on your logstash - make sure you use the "-w" flag. This sets the number of filter workers. With a single filter worker you get a single filter thread which uses only one CPU. I usually set it to twice CPU count. So I have quad core instances and use "-w 8". See http://logstash.net/docs/1.4.2/flags Hope this helps |
Thanks for the hint, I will look at redis "cache" installation. (Already worked on -w parameter :) ) Le me know about the fix, I will be more than happy to test it. |
I pushed the fix just now to develop branch. If you can give it a try that will be great. |
Ok, online now. Tomorrow I'll check what's happened. Thanks for the patch, will report later. |
Now the oldest file left open by log-courier is 1h old, so it seems that your fix works as expected, many thanks! |
@cova-fe Do you need to run a cron job to create the symlink pointing to the current file? I'm doing this for rotated log files, but my problem is the file is not always rotated at exactly the same time every cycle. Do you have such issue? |
@angel-smile Unfortunately, I no longer have access to this specific setup, so I can't check if the issue you report appears also here... |
@angel-smile My guess would be the logging program itself did the symlinking, or it was done using the logrotated scripts. Basically whatever did the rotation. We can work out something that works for you in #225 |
…cessful reproduction steps Fixes #7
Hi,
I'm using log-courier to get logs from files that are rotated hourly, using a symlink to the current one, so the log dir is similar to this example:
log.current -> log.07
log.07
log.06
log.05
[...]
in log.courier.conf I have the path pointing to log.current and "dead file": "1h"
what happens is that rotated files are still kept open by log-courier for hours (at least 20 or more). This puzzles me as it is not what I expected, I assumed files to be relaesed after 1 hour from rotation. Is this a (likely) my wrong astedsumption on log-courier behaviour or something not working in rotated log management by log-courier?
Thanks for any answer.
The text was updated successfully, but these errors were encountered: