Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rotated log files issue? #7

Closed
cova-fe opened this issue Jun 29, 2014 · 19 comments
Closed

Rotated log files issue? #7

cova-fe opened this issue Jun 29, 2014 · 19 comments
Assignees
Labels
Milestone

Comments

@cova-fe
Copy link

cova-fe commented Jun 29, 2014

Hi,
I'm using log-courier to get logs from files that are rotated hourly, using a symlink to the current one, so the log dir is similar to this example:
log.current -> log.07
log.07
log.06
log.05
[...]

in log.courier.conf I have the path pointing to log.current and "dead file": "1h"
what happens is that rotated files are still kept open by log-courier for hours (at least 20 or more). This puzzles me as it is not what I expected, I assumed files to be relaesed after 1 hour from rotation. Is this a (likely) my wrong astedsumption on log-courier behaviour or something not working in rotated log management by log-courier?
Thanks for any answer.

@driskell
Copy link
Owner

Hi again

I've not tested with symlink but will see if I can reproduce this.

When you say "dead file" do you mean "dead time"?

@cova-fe
Copy link
Author

cova-fe commented Jun 29, 2014

Yes, "dead time". Lack of caffeine :)

2014-06-29 11:18 GMT+02:00 driskell [email protected]:

Hi again

I've not tested with symlink but will see if I can reproduce this.

When you say "dead file" do you mean "dead time"?


Reply to this email directly or view it on GitHub
#7 (comment).

Fabio

@driskell
Copy link
Owner

In the log-courier output, do you see "Registrar received X events"? This confirms that events are getting sent to the server side. If events aren't getting sent/received it will keep the file open as it's not sent the contents yet.

@cova-fe
Copy link
Author

cova-fe commented Jun 29, 2014

The message you refer to is present in logs and I can confirm that events are sent. I've yet to look at the code (not knowing nothing about go it will take some time :) ) but could be possible that the file gets opened with the real name, as can be seen by lsof, and the age check is performed against the symlink? this can explain the behaviour...

@driskell
Copy link
Owner

The age check isn't performed against any property of the file. It actually just watches the file and if it doesn't change in X hours, it stops harvesting. If it then subsequently changes it will start again.

So unless those old log files are still receiving new entries, it should stop on those. And you'll see a "Stopping harvest of %s; last change was %v ago" message in the logs.

Let me have a think on this to see if I can think of a reason it would do it. If you can find out when it does close the files (you say its after 20 hours) that would be useful... if it's 24hours then it may be the config didn't apply correctly (24h is the default) - in which case can you post your config?

Thanks for your patience.

@cova-fe
Copy link
Author

cova-fe commented Jun 29, 2014

Sure... I made some works yesterday, so I'm waiting for 24h to pass. I'll send updates this evening.

Thanks for your cooperation!

@cova-fe
Copy link
Author

cova-fe commented Jun 29, 2014

Ok, I have some more data.
The oldest file still open on log-courier ls has been updated more than 24 hours ago:
Jun 28 18:00 (mtime), but ctime is a bit different: Jun 29 00:10 (basically, post-midnight archiving). Time now is 20:34 Jun 29, so if the file is closed by log-courier according to mtime, something is not ok. If it rotates the file according to ctime, I have to wait a little bit to see the result.
However, in my conf file I have the following data (files section):

"files": [
{
"paths": [ "/var/log/log_current" ],
"fields": { "type": "app", "set": "system" },
"dead time": "1h"
}
]

so something does not sounds ok anyway :)

@driskell
Copy link
Owner

When it says registrar received events does it say 1024?
Dead time works on the time of the last read line (so idle time). I wonder if it's been busy and not read to the end of the file yet?

Could you get the size of the files in bytes and look inside the .log-courier file? (It gets saved in the current directory and contains latest offsets if files - it will confirm if we've read to the end or not.)

Thanks.

@cova-fe
Copy link
Author

cova-fe commented Jun 29, 2014

Yes, it says 1024 (mostly, sometimes it says 1 or 2, but immediately afterwards says again 1024).
Interesting enough, in .log-courier the filename used is log_current, the name of the link, but lsof shows that log-courier keeps open the real files (of course).

Looking at the inode, I got a bit puzzled. It seems that the inode bounces back and forth between two different inodes: one is the inode of the file pointed by the log_current symlink, where the other inode is from an older file. In either cases the offset is not corresponding to the file size... but the logs are flowing to logstash in a timely fashion, at least looking to elasticsearch latest entries.
/me scratching head.

Definitely not what I expected :)

@driskell
Copy link
Owner

Ohhh that's a bug then. It's saving two different files under the same name. I fixed that with normal rotations but possibly symlink introduces an interesting difference that isn't handled properly.

It should stop saving the offset of the old file once it goes "out of scope" and only save that for the new file. Then to prevent loss you would need to keep all files "in scope" of course.

It explains the problem though - it's not read to end of those files yet so can't close them. Possibly logstash cannot keep up with the flow. This explains the spurts of 1024 then much less as logstash buffer fills. Would you agree?

I'll look into the overlapping save though that's not supposed to happen!

@driskell driskell added the bug label Jun 29, 2014
@driskell driskell self-assigned this Jun 29, 2014
@cova-fe
Copy link
Author

cova-fe commented Jun 30, 2014

I just checked and all files are still open, even the ones older than 24h so this seems to confirm your diagnosis.
I'll try to work on logstash to improve the situation, do you think this can improve the situation wrt not rotated files? It's a bit of showstopper at this moment...
I'm trying to look at code to be of some help, but as stated before not being knowledgeable on golang I'm a bit stuck on basics :)

@driskell
Copy link
Owner

I've worked out the overlapping save where offsets flicks. I will push a fix for that later tonight hopefully so you can test.

However, files will still remain open - so it won't fix the actual problem for you. What you ideally need is a broker / cache for logs between the shippers and LogStash. This way instead of the shipper holding logs because it needs them - the broker takes all the logs really quick and it keeps them instead. Logstash then feeds from the broker.

If LogStash still can't keep up with the number of logs coming in though - this will just mean you fill the broker cache and again, logs stay open on the shipper. But with a broker you can easily add more Logstash instances to boost processing. Just set it up on another machine to feed from the same broker.

I use redis and have a logstash receiver instance that simply inputs, then outputs to redis. Then I hook up three or four other logstash indexer instances that input from redis and do filtering then output to elastic search.

To make sure you get max throughput on your logstash - make sure you use the "-w" flag. This sets the number of filter workers. With a single filter worker you get a single filter thread which uses only one CPU. I usually set it to twice CPU count. So I have quad core instances and use "-w 8". See http://logstash.net/docs/1.4.2/flags

Hope this helps

@cova-fe
Copy link
Author

cova-fe commented Jun 30, 2014

Thanks for the hint, I will look at redis "cache" installation. (Already worked on -w parameter :) )

Le me know about the fix, I will be more than happy to test it.

@driskell
Copy link
Owner

I pushed the fix just now to develop branch. If you can give it a try that will be great.

@cova-fe
Copy link
Author

cova-fe commented Jun 30, 2014

Ok, online now. Tomorrow I'll check what's happened. Thanks for the patch, will report later.

@cova-fe
Copy link
Author

cova-fe commented Jul 1, 2014

Now the oldest file left open by log-courier is 1h old, so it seems that your fix works as expected, many thanks!

@driskell driskell added this to the 0.11 milestone Jul 12, 2014
@angel-smile
Copy link

@cova-fe Do you need to run a cron job to create the symlink pointing to the current file? I'm doing this for rotated log files, but my problem is the file is not always rotated at exactly the same time every cycle. Do you have such issue?

@cova-fe
Copy link
Author

cova-fe commented Aug 22, 2015

@angel-smile Unfortunately, I no longer have access to this specific setup, so I can't check if the issue you report appears also here...

@driskell
Copy link
Owner

@angel-smile My guess would be the logging program itself did the symlinking, or it was done using the logrotated scripts. Basically whatever did the rotation. We can work out something that works for you in #225

driskell added a commit that referenced this issue Feb 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants