Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for tracking performance of individual Events in the grok processor #4196

Closed
graytaylor0 opened this issue Feb 28, 2024 · 1 comment · Fixed by #4197
Closed
Assignees
Labels
ease-of-use Improving the ease-of-use for an existing feature enhancement New feature or request
Milestone

Comments

@graytaylor0
Copy link
Member

graytaylor0 commented Feb 28, 2024

Is your feature request related to a problem? Please describe.
As a user of a pipeline with many grok processors and patterns, it is difficult for me to debug the performance of my grok processors. The only metric is the grokProcessingTime and this is shared/aggregated between all grok processor instances. The only way to know which Events are spending a lot of time in grok is if the grok match times out, and tags the event with tags_on_timeout. However, there can still be very slow patterns that do not hit the pattern, and can be optimized to improve performance.

Describe the solution you'd like
An option to create metadata on Events that contains important debug information related to grok matching for this Event.

- grok:
     performance_metadata: true // defaults to false
     match:
       log:
          - %{PATTERN_1}
          - %{PATTERN_2}

When the include_performance_metadata flag is set to true, the grok processor can add metadata fields to the Event. To start, these metadata fields can be

_total_grok_processing_time: 2500 // in milliseconds
_total_grok_patterns_attempted: 10 // The number of individual patterns this Event attempted to match on 

These same metadata fields will be shared between all grok processors. So given this configuration

- grok:
     include_performance_metadata: true
     match:
       log:
          - %{PATTERN_1} // mismatch after 1000 ms
          - %{PATTERN_2} // matches after 1000 ms
- grok:
     performance_metadata: true
     match:
       log:
          - %{PATTERN_3} // mismatch after 1000 ms
          - %{PATTERN_4} // mismatch after 1000 ms

If an Event takes the path indicated by the comments, the end result of the metadata fields would be

_total_grok_processing_time: 4000
_total_grok_patterns_attempted: 4

This metadata can then be used with the getMetadata function of Data Prepper expressions as needed (such as copying it over to the Event with add_entries

- add_entries:
     entries:
        - add_when: 'getMetadata("_total_grok_processing_time") != null'
           key: "grok_processing_time"
           value_expression: 'getMetadata("_total_grok_processing_time")'

Describe alternatives you've considered (Optional)
Add this metadata to Events by default without the need for configuring the include_performance_metadata parameter. While minimal, this change could add memory unnecessarily

Another alternative is to keep the parameter, and default it to true, and allowing users to disable it if requested.

Additional context
Add any other context or screenshots about the feature request here.

@graytaylor0 graytaylor0 added enhancement New feature or request ease-of-use Improving the ease-of-use for an existing feature and removed untriaged labels Feb 28, 2024
@graytaylor0 graytaylor0 changed the title Add support for tracking individual Events grok processing time Add support for tracking performance of individual Events in the grok processor Feb 28, 2024
@graytaylor0 graytaylor0 self-assigned this Feb 28, 2024
@github-project-automation github-project-automation bot moved this from Unplanned to Done in Data Prepper Tracking Board Feb 29, 2024
@dlvenable
Copy link
Member

See #4230 for the change to add _total_grok_processing_time

@dlvenable dlvenable added this to the v2.7 milestone Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ease-of-use Improving the ease-of-use for an existing feature enhancement New feature or request
Projects
2 participants