-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat does not harvest rotating files in case of network drive located files #1361
Comments
@BellBiker Thanks for the report. Will have a look. I reformatted the post to make it better readable. Please also be aware of https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-network-volumes.html |
@BellBiker Can you have a look at #1375 This could potentially fix your problem. |
@ruflin #1375 explains that the bug is caused if multiple files are rotated at the same time. So I am not sure whether this describes the conditions of the above. In my example there is only one file harvested from filebeat (server.log) and the first rotation causes the problem that the file information for the file itsself and the rotated file server.log.1, which is not harvested, retrieves the same file information, which I read with an independent application. From the solution I guess that the file information is stored outside the registrar. Maybe this storing algorithm interferes with reading file information using the windows api function and therefore causes the problem. Based on this guess work the solution might also solve the reported scenario. |
The behaviour above is quite strange. It looks like some information is actually cached and does not represent the actual values. Perhaps @andrewkroh has an idea here. I remember @urso also had mentioned some caching issues on remote drives? Under windows the files are identified based on volume, IdxHi, IdxLo (https://github.com/elastic/beats/blob/master/filebeat/input/file_windows.go#L34). As far as I know two different files are never allowed to have the same values. Unfortunately these do not appear in your printout above. |
Hello, But as explained here https://msdn.microsoft.com/en-us/library/aa363788%28v=vs.85%29.aspx the "The identifier that is stored in the nFileIndexHigh and nFileIndexLow members is called the file ID." The ! ID within my print outs is a compilation of idxHi (DWORD) and idxLo (DWORD) using (as I found now within the delivered include files) the WinAPI function MakeQWord. Hopefully this information is sufficient to understand the print out. |
@BellBiker Thanks for the additional details. |
I had once again a look at the posts above. The part I really can't understand is how the id and volume id of two files can be the same. Filebeat reads the same information and gets obviously confused. My current assumption is that the Id is in reality not the same, but the value delivered by the WinAPI is cached. I remember @urso found once a command to disable or flush this cache? Besides that I'm not sure how this problem could be solved as Filebeat does not get the accurate information from the system. One potential future solution for the problem above is using fingerprints to compare files instead of the identifiers. Some work and discussions have started here in Logstash: jordansissel/ruby-filewatch#79 |
Can you provide the AutoIt source you used to generate that information? It might help shed some light on the internals and make things more clear to everyone. I do know one issue in Golang is that for a File "information status" it does not use GetFileInformation until you need the IDs (such as when comparing if one status is the same as another). So let's say rotation happens. Then you compare - is it same? Let's say the rotated file status refers to the original path still, and the new status refers to same path of course. So both same name. We ask Golang to compare so it needs to load IDs. But it loads the IDs from the same path... So we get back yes same file when it is not. I can't remember but in courier I call "os.Samefile" at start of harvesting to check that it harvests the same file the prospector discovered, and I remember doing this explicitly to ensure the stat structure has IDs loaded from the start ready for rotation check. Though I don't remember exactly why :-) Maybe it will help explain some of this? The AutoIt source would definitely be interesting though - maybe it hits the same issue? |
@driskell Thanks for the details. I actually hit the exact same problem some time ago and also solve it using os.SameFile: https://github.com/elastic/beats/blob/master/filebeat/input/file_windows.go#L21 (could have been inspired by |
Sweet :) It would be good to see the AutoIt bits then. I've not seen anything like this before. If you have any findings on the GetFileInfo caching it'll be good to link it here I think as that's something I never encountered before |
@urso I think In remember you saw / found something like this for shared volumes? |
Hello together, So please save the following code snippet as files with a name as you want and the extension "au3" and open it with autoIT.
The following code has to be stored as MyWinAPIFiles.au3 ( If you change the name you have to change the include in the above example). The code changes for printing idxHi and idxLo are inside the function _WinAPI_GetFileInformationByHandle($hFile)
|
@BellBiker Any chance to test this again with 5.0.0-alpha3 of filebeat? https://www.elastic.co/downloads/beats/filebeat Quite a bit of refactoring happened in the file crawling and the handling of modified time, which makes me think this could be fixed? |
Hello, But running the JBOSS on a different machine than the filebeat causes the same result whether I am running the filebeat 1.1.1 or the filebeat 5.0.0.. After the first rotation filebeat stops harvesting logs and I got the same trouble with the file information as described above. The rotated file and the new log file both provide the same file information regarding idxLo and idxHi and if the rotated file reaches the max number of rotated files the rotation stops and the log file, which has to be rotated, increases unlimited until I stop filebeat. Sorry for this bad news. Only to be clear. I only exchanged filebeat. The remaining ELK stack stays on the old versions. |
Thanks for testing. By 5.0.0 I assume you mean alpha3 or the nightly snapshot? As it works locally, there seem to be really some more challenges with the shared volumes. If the file system does only deliver cached information to filebeat, I'm not sure how could work around this. Filebeat could try to invalidate the cache but I don't think that should be the job of filebeat. The good news for me is that it works as expected on the same machine as that is where we think filebeat should be. Is there in your case something blocking you from installing it on the same machine? |
My tests are based on 5.0.0_alpha3. The important point from my view is, that filebeat seems to be the root cause that the file information is wrong. As long as filebeat does not run the file information is changed immediately during the rotation. As a result of the problem I have installed filebeat as a service on those machines. It causes some additional work, but this is feasible. And meanwhile I have learned that I am able to start and stop the service after installation remotely without an rdp-session. So for the moment its ok for me. But if it comes to the point that we want to run filebeat against more test installations the overhead increases. In this case it would be fine to have this issue fixed to manage the whole stack from one location without any filebeat distribution. |
I think that is a critical point I somehow have missed so far. Did you ever try to use |
Based on the comment provide to this property within the yaml-template file I always changed this property to the value true. |
@BellBiker Ok, that seems like no luck here. TBH I'm still not 100% sure what happens here. If filebeat reaches the end of a file and force_close_files was enabled, filebeat should close the file handler. That means also afterwards updating the file should work again as expected as filebeat does not keep it open. |
@BellBiker Several new config options to close a harvester were introduced in the nightly builds: #1600 (comment) It would be interesting to hear if one of these options help in your case. Also in #2020 we fix a bug that could lead in some special cases to some open file handlers which could potentially explain your problem. Any chance to test the nightly builds on your side as soon as #2020 is merged? |
@ruflin Sorry, due to other tasks and my vacation I cannot try this before end of August. I will create an reminder to myself. |
@BellBiker Thanks. Enjoy your vacations. We will keep investigating. I will post here in case we find some more issues. |
@BellBiker Any updates on this? |
@ruflin Except the fact that I can confirm that the problem further persists I am fully packed with other work. Therefore I have to apologize that I can't dig into this problem within the next weeks. |
@BellBiker Thans for the quick update. No worries, will keep digging here. Thanks for all the help so far. By still exists, which version are you referring to? |
I'm closing this issue as since the 5.0 GA release the only "similar" cases we have seen to this is with network drives and in the end it always came down to wiered caching or even null values coming from the network drive which cannot be really fixed in filebeat. One of the ides to fix this can be found in this issue here: #3501 @BellBiker I really appreciate all the effort you put into this. I would love to be all bug reports so detailed. I'm strongly recommending that you install filebeat on each node if possible which should resolve this issue. |
The following ELK stack is used
We are using a JBOSS installation with a log rotation (based on log4j) and I recognized that filebeat-1.1.1-windows stops harvesting log files after rotating. If I run the JBOSS locally and the log files are stored either on the local machine or an usb drive the files are harvested without stopping at rotation.
Reading the file information with the windows api function GetFileInformationByHandle I recognized that the file information filebeat is harvesting does not longer change after file rotation. Additionally the file information of the rotated file are the same (all file attributes) for the file and its first rotation. If this file reaches the last rotation the rotation stops and the file, filebeat seems to harvest, is not longer rotated and grows unlimited.
File information after first rotation
As you can see the whole file information of server.log and server.log.1 is the same.
File information after second rotation (server.log and server.log.2 are the same)
File rotation after the fifth rotation
The reported ID of the server.log the same as at creation time and there is a server.log.5 in the windows explorer which is not recognized trying to get the file information.
The log file of filebeat provides the following information which fits to my observation (x times)
In the moment I stop filebeat the file information of the files recovers and the rotation took place (server.log.1 has now the size of more than 4000 kb and server.log.5 is recognized).
Based on this observation I updated only filebeat to filebeat 1.2.1 the result is the same as described above.
The text was updated successfully, but these errors were encountered: