-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exclude_lines and include_lines options #430
Conversation
I'd opt for 'AND' combining the regular expressions, not 'OR' combining. Or is already possible from within the regular expression. Just use '|' operator (see my comment on original issue). Maybe we should consider using 'CompilePOSIX'. These regexes are(should be) compatible to egrep. So I can test my filters with 'cat <infile> | egrep ... | egrep ... | egrep -v ... | egrep -v ...'. This is similar to AND-combining the regular expressions. |
@urso I would disagree here. I think the general use case is OR here if a list of regexp is provided. See the example from @monicasarbu |
@urso It's a good point to CompilePOSIX instead of Compile. We can also advice our users to test the regexp before using egrep. I didn't think about this before. Thanks for letting me know. I think it makes more sense OR than AND. Imagine you configure: include_lines: ["expr1", "expr2"]. I would say each expr1 and expr2 should define a group of lines to be included. It's true that you can use "|" inside the regexp to do OR, but I am not a fan of it as complicates the regexp. I think the regexp should be simpler as possible because you can easily have a mistake in it. |
Implementation LGTM. It would be nice to add some system tests but we can still do that in a second step. |
Well, thinking about chains of egrep. I think for exclude_lines OR is most similar to chains of 'egrep -v'. So I'm fine with exclude_lines using OR. Still not convinced of using OR for include_lines (as chains of egrep behave like AND). But I can live with either semantics. |
if len(h.ExcludeLinesRegexp) > 0 { | ||
if MatchRegExps(h.ExcludeLinesRegexp, text) { | ||
// drop line | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using continue here is righteous dangerous. When line is dropped, file offset is broken + backoff algorithm is not properly reset.
Maybe move generation of FileEvent and forwarding into it's own function. The check for generating/sending the event will be made in this extra function.
a5bf779
to
006d8d9
Compare
One potential solution to not have to modify the harvester would be to move the functionality to the spooler. The advantage of this would be that excluding / including would apply to all harvester types (at the moment there are only 2 and not well separated). The disadvantage is that the data would be "discarded" on step later. |
@@ -47,6 +47,8 @@ type ProspectorConfig struct { | |||
IgnoreOlderDuration time.Duration | |||
ScanFrequency string `yaml:"scan_frequency"` | |||
ScanFrequencyDuration time.Duration | |||
ExcludeLines []string `yaml:"exclude_lines"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably belongs into the HarvesterConfig. Same for next line.
Think about the above again, moving it to the spooler will not be an option, as the config is harvester specific. |
filebeat = self.start_filebeat() | ||
|
||
self.wait_until( | ||
lambda: self.log_contains( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More recently we started to prefer using conditions like self.output_has(lines=40)
. Seems to result in less flaky tests.
The current implementation looks clean enough to me. @monicasarbu may I ask for a system test that uses both the include and exclude options? Whatever behavior we have in that case, we have to make sure it doesn't change in the future. |
event.SetFieldsUnderRoot(h.Config.FieldsUnderRoot) | ||
h.SpoolerChan <- event // ship the new event downstream | ||
// Eat line | ||
h.Offset += int64(bytesRead) // Update offset if complete line has been processed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line appears in both branches. we can move it out after the if-else and remove else branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I had the impression that h.Offset is used in SetFieldsUnderRoot. I'll fix it. Thanks!
All in all LGTM |
05ef634
to
ce4b039
Compare
Done with all the comments. Thank you everyone! |
@@ -30,6 +30,14 @@ filebeat: | |||
# * stdin: Reads the standard in | |||
input_type: log | |||
|
|||
# Exclude lines. A list of regular expressions to match. It drops the lines that are | |||
# matching any regular expression from the list. By default, no lines are dropped. | |||
# exclude_lines: ["^DBG"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tiny detail: We always have not space after # for the actual config. This make it possible to differentiate it from the comment and make uncommenting easier. Same below.
@monicasarbu I left some additional comments. |
- exclude_lines option accepts a list of regular expressions to match. It drops all the lines that are matching any of the regular expressions. - include_lines option accepts a list of regular expressions to match. It exports only the lines that are matching any of the regular expressions. Example: - include_lines: ["^ERR", "^WARN"] Results in exporting only the lines that start with "ERR" or "WARN". If both options are present, the include_lines is executed first.
b1e331a
to
98094aa
Compare
@monicasarbu LGTM. Can you rebase again before merging? There seems to be a conflict. |
expressions, the line is dropped and not exported
expressions, the line is exported
Example:
Results in exporting only the lines that start with "ERR" or "WARN".
Note: First executes the include_lines and then exclude_lines.
This fixes the issue: https://github.com/elastic/filebeat/issues/78