Edit "how file beat works", close options, and symlinks config #2562

dedemorton · 2016-09-15T21:49:16Z

No description provided.

dedemorton · 2016-09-15T21:51:42Z

This adds edits for work tracked by #2482

monicasarbu · 2016-09-15T21:54:48Z

@ruflin can you please have a look?

ruflin

Thanks for the review. I left some minor comments. There are 2-3 things that changed since I wrote the docs.

ruflin · 2016-09-16T07:08:33Z

filebeat/docs/faq.asciidoc


 [float]
 [[reduce-registry-size]]
 === Registry file is too large?

-Filebeat keeps all states of the files and persists the states on disk in the `registry_file`. The states are used to continue file reading at a previous position in case filebeat is restarted. In case every day a large amount of new files is constantly produced, the registry file grows over time. To reduce the size of the registry file, there are two configuration variables: `clean_removed` and `close_inactive`.
+Filebeat keeps the state of each file and persists the state to disk in the `registry_file`. The file state is used to continue file reading at a previous position when Filebeat is restarted. If a large number of new files are produced every day, the registry file might grow to be too large. To reduce the size of the registry file, there are two configuration options available: <<clean-removed,`clean_removed`>> and <<close-inactive,`close_inactive`>>.


The mistake was already in the previous entry. It should be clean_inactive and not close_inactive :-(

ruflin · 2016-09-16T07:08:57Z

filebeat/docs/faq.asciidoc

+* <<close-renamed,`close_renamed`>>
+* <<close-removed,`close_removed`>>
+* <<close-eof,`close_eof`>>
+* <<close-timeout,`close_timeout`>>


Now that we added the config option harvester_limit it would be nice to also list it here.

@dedemorton We should address this in an other PR.

ruflin · 2016-09-16T07:14:10Z

filebeat/docs/how-filebeat-works.asciidoc


-A harvester is responsible to read the content of a single file. This is done by reading each file line by line and send the content to the output. For each file one harvester is started. The harvester is responsible to open and close files. That means, as long as a harvester is running, the file descriptor stays open. Even if a file is removed or renamed, filebeat will keep reading the file. This has the side affect that the space on your disk will be reserved until the harvester is stopped.
+A harvester is responsible for reading the content of a single file. The harvester reads each file, line by line, and sends the content to the output. One harvester is started for each file. The harvester is responsible for opening and closing the file, which means that the file descriptor remains open while the harvester is running. If a file is removed or renamed while it's being harvested, Filebeat continues to read the file. This has the side effect that the space on your disk is reserved until the harvester closes. By default, Filebeat keeps the file open for harvesting for 5 minutes.


Last setenced: "By default, Filebeat keeps the file open until close_inactive is reached to send new events in near real time."

We should not use 5 minutes here to make it more generic in case we change the default.

@ruflin I'll mention the close_inactive option instead of a value. However, we probably shouldn't say "in near real time" here because that won't be true if the user changes the close_inactive setting to a much lower number, right?

ruflin · 2016-09-16T07:16:17Z

filebeat/docs/how-filebeat-works.asciidoc

@@ -29,15 +39,14 @@ filebeat.prospectors:
    - /var/path2/*.log
 -------------------------------------------------------------------------------------

-Filebeat currently supports two `prospector` types: `log` and `stdin`. Each prospector type can be defined multiple times. The log prospector checks for each file if a harvester has to be started, if one is already running or the file can be ignored because of configuration options which were set. New files are only picked up, if the offset / size of the file changed since the harvester was closed.
+Filebeat currently supports two `prospector` types: `log` and `stdin`. Each prospector type can be defined multiple times. The `log` prospector checks each file to see whether a harvester needs to be started, whether one is already running, or whether the file can be ignored (see <<ignore-older,`ignore_older`>>). New files are only picked up if the offset or size of the file has changed since the harvester was closed.


We should only use size here: " offset or size ". offset / size were meant as being the same, but reading again I see that this is confusing. I think size is the more common one.

ruflin · 2016-09-16T07:21:57Z

filebeat/docs/reference/configuration/filebeat-options.asciidoc

-Requirement: ignore_older > close_inactive
-
-Before a file can be ignored by the prospector, it must be closed. To ensure a file is not harvested anymore when it is ignored, ignore_older must be set to a longer duration then `close_inactive`. It can happen, that a file is still harvested but already falls under `ignore_older` as the harvester didn't finish yet. The harvester will finish reading and close it after `close_inactive` is reached.
+If a file that's currently being harvested falls under `ignore_older`, the harvester will finish reading the file and close it after `close_inactive` is reached. 


"the harvester will first finish reading ... is reached. Only after that the file will be ignored."

ruflin · 2016-09-16T07:22:29Z

filebeat/docs/reference/configuration/filebeat-options.asciidoc


-
-Requirement: ignore_older > close_inactive


You removed this line. Any idea how we could keep this in somehow (perhaps in a nicer way?)

It seemed redundant to me because you repeated it in the text. I thought maybe it was a note that you forgot to remove. Also, the angle bracket is a bit ambiguous. I realize it means "greater than", but users could potentially misinterpret this (for example, someone might think it means "set this to"). Gotta account for all the ways that people can misinterpret things. :-) When we want to call something out as important, we use "important" notes. I'll flag this this one.

ruflin · 2016-09-16T07:26:35Z

filebeat/docs/reference/configuration/filebeat-options.asciidoc



 [[clean-options]]
 ===== clean_*

-The `clean_*` variables are used to clean up the state entries. This helps to reduce the size of the registry file and can prevent a potential <<inode-reuse-issue>>. These options are disabled by default as wrong settings can lead to data duplication as complete log files are sent again.
+The `clean_*` options are used to clean up the state entries in the registry file. These settings help to reduce the size of the registry file and can prevent a potential <<inode-reuse-issue,inode reuse issue>>. These options are disabled by default because incorrect settings can lead to data duplication when complete log files are sent again.


"These options are disabled" is not correct anymore, as "clean_removed" is enabled by default.

Good catch. I'll remove the line entirely because we warn the user whenever specific options might be problematic.

Edit how file beat works and close options

4444b7f

dedemorton added docs review labels Sep 15, 2016

ruflin requested changes Sep 16, 2016

View reviewed changes

Fix issues from review

260231c

ruflin merged commit 46fe81e into elastic:master Sep 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Edit "how file beat works", close options, and symlinks config #2562

Edit "how file beat works", close options, and symlinks config #2562

dedemorton commented Sep 15, 2016

dedemorton commented Sep 15, 2016 •

edited

Loading

monicasarbu commented Sep 15, 2016

ruflin left a comment

ruflin Sep 16, 2016

ruflin Sep 16, 2016

ruflin Sep 19, 2016

ruflin Sep 16, 2016

dedemorton Sep 16, 2016

ruflin Sep 16, 2016

ruflin Sep 16, 2016

ruflin Sep 16, 2016

dedemorton Sep 16, 2016

ruflin Sep 16, 2016

dedemorton Sep 16, 2016


		A harvester is responsible to read the content of a single file. This is done by reading each file line by line and send the content to the output. For each file one harvester is started. The harvester is responsible to open and close files. That means, as long as a harvester is running, the file descriptor stays open. Even if a file is removed or renamed, filebeat will keep reading the file. This has the side affect that the space on your disk will be reserved until the harvester is stopped.
		A harvester is responsible for reading the content of a single file. The harvester reads each file, line by line, and sends the content to the output. One harvester is started for each file. The harvester is responsible for opening and closing the file, which means that the file descriptor remains open while the harvester is running. If a file is removed or renamed while it's being harvested, Filebeat continues to read the file. This has the side effect that the space on your disk is reserved until the harvester closes. By default, Filebeat keeps the file open for harvesting for 5 minutes.

Edit "how file beat works", close options, and symlinks config #2562

Edit "how file beat works", close options, and symlinks config #2562

Conversation

dedemorton commented Sep 15, 2016

dedemorton commented Sep 15, 2016 • edited Loading

monicasarbu commented Sep 15, 2016

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dedemorton commented Sep 15, 2016 •

edited

Loading