-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce simple string matcher #2433
Conversation
79a0120
to
ae31a99
Compare
Updated matcher with support for handling dates (only numeric) at beginning of logs and alternative prefixes. Updated benchmarks:
|
@urso |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments.
logContentLevel, | ||
} | ||
|
||
var mixedContent = makeContent("mixed", `Lorem ipsum dolor sit amet, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice to see this extensive test suite. I kind of expect that we will hit some cases we haven't thought of but this will be easily fixible by just adding a test with the exception and find the issue.
{ | ||
`^\s*$`, | ||
typeOf((*emptyWhiteStringMatcher)(nil)), | ||
// []string{"", " ", "\t"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this line be removed?
@@ -0,0 +1,204 @@ | |||
package match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are kind of two solutions here:
- We fix it automatically for the user what this code is doing
- We log a warning to the that his regexp is not optimal and that he could optimize it when doing x
- Doing a combination of both
I would opt for version 3. But will any user ever have a look at it an optimize the regexp query if we tell him it is not optimal? Main thing about doing 1 is that this could introduce potential errors. Means the regexp we use internally is not the regexp someone has in the config file. So it would be good to at least have some debug message with the now used regexp.
Hm... my editor is using |
Added alternative literals use case (e.g.
|
@urso do you prefer squash and merge or to do rebase yourself? |
let me rebase first and get a proper commit message. |
Provide match.Matcher and match.ExactMatcher using regular expressions for matching use-case only. The matchers compile a regular expression into a Matcher, which only provides the Match functionality. This gives us a chance to optimize/replace some common cases used for matching: - replace capture-groups by non-capturing groups - remove leading/trailing `.*` expressions (Match already searches for sub-string matching the regex) - replace simple literal searches with `==` and `strings.Contains` and `strings.startsWith` - replace regex for alternative literals (e.g. `DEBUG|INFO|ERROR`) with strings.Contains over set of literals - optimized empty-lines checks If input regular expression can not be matched to a simple case, regexp.Regexp will be used. The `ExactMatcher` will embedd `<regex>` into `^<regex>$` by default. Note: Matcher does currently not split simple cases. e.g. `abc.*def` or `abc.def` will still fallback to regexp.Regexp.
f7f6b7c
to
c683077
Compare
Match further optimizes special regular expressions, trying to replace
special regular patterns with more performance string searches.
benchmark results:
run 1:
run 2:
In
BenchmarkBeginningDate
benchmark the regular expression can not be optimized (yet?), resulting inRegex
andMatch
tests, both executing full regular expression.