-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/ottl] Support for grok patterns #32593
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Which grok library do you recommend? Do you know the performance of grok vs regex? |
we currently use grok processing on our java backend. there's also this one: https://github.com/logrusorgru/grokky In my ideal world, Elastic has its own Go Grok library: I hope we will be able to prioritize this soon to make it a reality. |
@TylerHelmuth is there a path you prefer? or if you're ok with unmaintained i'd prefer working on |
I definitely don't like willing taking a dependency on a unmaintained library, but IDK the effort to create one for grok. |
i spent some time since last week, i have something that we will host under elastic repo and I will use this one. |
Assigning to @michalpristas on his request. |
**Description:** Added converter to OTTL for parsing grok patterns **Link to tracking Issue:** #32593 **Testing:** added unit tests, e2e test for manual test use this config ```yaml receivers: filelog: include: [ demo.log ] start_at: beginning exporters: debug: verbosity: detailed sampling_initial: 10000 sampling_thereafter: 10000 processors: transform: error_mode: ignore log_statements: - context: log statements: - merge_maps(attributes, ExtractGrokPatterns(body, "%{WOOHOO}", true, ["WOOHOO=%{ELB_URI} otel"]), "insert") service: pipelines: logs: receivers: [filelog] processors: [transform ] exporters: - debug ``` add this line to `demo.log` ``` http://user:[email protected]:80/path?query=string otel ``` Output should contain these attributes: ``` Attributes: -> log.file.name: Str(demo.log) -> url.username: Str(user) -> url.domain: Str(example.com) -> url.port: Int(80) -> url.path: Str(/path) -> url.query: Str(query=string) -> url.scheme: Str(http) ``` For default set of patterns check: http://user:[email protected]:80/path?query=string This implementation uses a complete set defined in this directory: https://github.com/elastic/go-grok/tree/main/patterns `%{ELB_URI}` comes from [AWS set](https://github.com/elastic/go-grok/blob/main/patterns/aws.go) and is equivalent to `((?P<url.scheme>[A-Za-z][A-Za-z0-9+\.-]+)://(?:(?P<url.username>([a-zA-Z0-9._-]+))(?::[^@]*)?@)?(?:((?P<url.domain>(?:((?:(((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?)|((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))))|(\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))))(?::(?P<url.port>\b[1-9][0-9]*\b))?))?(?:((?P<url.path>(/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\-]+)+)(?:\?(?P<url.query>[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*))?))?)` **Documentation:** updated ottl/readme --------- Co-authored-by: Tyler Helmuth <[email protected]> Co-authored-by: Evan Bradley <[email protected]>
closed via #34037 |
**Description:** Added converter to OTTL for parsing grok patterns **Link to tracking Issue:** open-telemetry#32593 **Testing:** added unit tests, e2e test for manual test use this config ```yaml receivers: filelog: include: [ demo.log ] start_at: beginning exporters: debug: verbosity: detailed sampling_initial: 10000 sampling_thereafter: 10000 processors: transform: error_mode: ignore log_statements: - context: log statements: - merge_maps(attributes, ExtractGrokPatterns(body, "%{WOOHOO}", true, ["WOOHOO=%{ELB_URI} otel"]), "insert") service: pipelines: logs: receivers: [filelog] processors: [transform ] exporters: - debug ``` add this line to `demo.log` ``` http://user:[email protected]:80/path?query=string otel ``` Output should contain these attributes: ``` Attributes: -> log.file.name: Str(demo.log) -> url.username: Str(user) -> url.domain: Str(example.com) -> url.port: Int(80) -> url.path: Str(/path) -> url.query: Str(query=string) -> url.scheme: Str(http) ``` For default set of patterns check: http://user:[email protected]:80/path?query=string This implementation uses a complete set defined in this directory: https://github.com/elastic/go-grok/tree/main/patterns `%{ELB_URI}` comes from [AWS set](https://github.com/elastic/go-grok/blob/main/patterns/aws.go) and is equivalent to `((?P<url.scheme>[A-Za-z][A-Za-z0-9+\.-]+)://(?:(?P<url.username>([a-zA-Z0-9._-]+))(?::[^@]*)?@)?(?:((?P<url.domain>(?:((?:(((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?)|((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))))|(\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))))(?::(?P<url.port>\b[1-9][0-9]*\b))?))?(?:((?P<url.path>(/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\-]+)+)(?:\?(?P<url.query>[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*))?))?)` **Documentation:** updated ottl/readme --------- Co-authored-by: Tyler Helmuth <[email protected]> Co-authored-by: Evan Bradley <[email protected]>
Component(s)
pkg/ottl
Is your feature request related to a problem? Please describe.
just a copy of what i wrote as a comment in other issue as i thought we're discussing this
Why should we support grok?
grok if you ask me is much more readable and very common for our users.
what i have in mind is also custom pattern definition so you could do something like this
with
ExtractGrokPattern
signature like thisExtractGrokPattern(source, pattern, custom_patterns)
custom_patterns
is a mapand input string
my beagle is BLUE
you could do
and this would result in
while this example is not that realistic nginx example from our pipeline shows the beauty of it
this pattern is complex and writing this using regex would be ugly
Describe the solution you'd like
ExtractGrokPattern(source, pattern, custom_patterns)
on top of ExtractPattern to give user an optionGrok uses regex anyways but provides better experience
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: