You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
As a user of a pipeline with many grok processors and patterns, it is difficult for me to debug the performance of my grok processors. The only metric is the grokProcessingTime and this is shared/aggregated between all grok processor instances. The only way to know which Events are spending a lot of time in grok is if the grok match times out, and tags the event with tags_on_timeout. However, there can still be very slow patterns that do not hit the pattern, and can be optimized to improve performance.
Describe the solution you'd like
An option to create metadata on Events that contains important debug information related to grok matching for this Event.
When the include_performance_metadata flag is set to true, the grok processor can add metadata fields to the Event. To start, these metadata fields can be
_total_grok_processing_time: 2500 // in milliseconds
_total_grok_patterns_attempted: 10 // The number of individual patterns this Event attempted to match on
These same metadata fields will be shared between all grok processors. So given this configuration
- grok:
include_performance_metadata: true
match:
log:
- %{PATTERN_1} // mismatch after 1000 ms
- %{PATTERN_2} // matches after 1000 ms
- grok:
performance_metadata: true
match:
log:
- %{PATTERN_3} // mismatch after 1000 ms
- %{PATTERN_4} // mismatch after 1000 ms
If an Event takes the path indicated by the comments, the end result of the metadata fields would be
This metadata can then be used with the getMetadata function of Data Prepper expressions as needed (such as copying it over to the Event with add_entries
Describe alternatives you've considered (Optional)
Add this metadata to Events by default without the need for configuring the include_performance_metadata parameter. While minimal, this change could add memory unnecessarily
Another alternative is to keep the parameter, and default it to true, and allowing users to disable it if requested.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
graytaylor0
changed the title
Add support for tracking individual Events grok processing time
Add support for tracking performance of individual Events in the grok processor
Feb 28, 2024
Is your feature request related to a problem? Please describe.
As a user of a pipeline with many grok processors and patterns, it is difficult for me to debug the performance of my grok processors. The only metric is the
grokProcessingTime
and this is shared/aggregated between all grok processor instances. The only way to know which Events are spending a lot of time in grok is if the grok match times out, and tags the event withtags_on_timeout
. However, there can still be very slow patterns that do not hit the pattern, and can be optimized to improve performance.Describe the solution you'd like
An option to create metadata on Events that contains important debug information related to grok matching for this Event.
When the
include_performance_metadata
flag is set to true, the grok processor can add metadata fields to the Event. To start, these metadata fields can beThese same metadata fields will be shared between all grok processors. So given this configuration
If an Event takes the path indicated by the comments, the end result of the metadata fields would be
This metadata can then be used with the
getMetadata
function of Data Prepper expressions as needed (such as copying it over to the Event withadd_entries
Describe alternatives you've considered (Optional)
Add this metadata to Events by default without the need for configuring the
include_performance_metadata
parameter. While minimal, this change could add memory unnecessarilyAnother alternative is to keep the parameter, and default it to true, and allowing users to disable it if requested.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: