Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds the EventKey and EventKeyFactory. #4627

Merged
merged 2 commits into from
Jun 17, 2024

Conversation

dlvenable
Copy link
Member

@dlvenable dlvenable commented Jun 14, 2024

Description

This PR adds an EventKey model for accessing data within the Event model. This allows for faster validations and for caching internal details between calls.

Some significant changes:

  • Adds overloads for all operations in Event that require a key to also use the EventKey.
  • Implements the existing operations in JacksonEvent that use the String key to create an EventKey and call that.
  • Adds a new JacksonEventKey implementation which works with JacksonEvent.
  • Moves validations from JacksonEvent into JacksonEventKey to consolidate them.
  • Adds a concept of EventAction to perform validations at creation time. This way, plugins can validate keys for the action they will take.
  • Makes the EventKeyFactory usable to the @DataPrepperPluginConstructor constructors.
  • Adds a TestEventKeyFactory and consolidates some code with the existing TestEventFactory
  • Includes integration testing in ProcessorPipelineIT.

See #4628 for an example of usage.

Issues Resolved

Resolves #1916.

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

*
* @since 2.9
*/
enum EventAction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of benefit do we get from this? If I have a key that is for reading and give it ALL, is the downside that the processor could unintentionally write to the key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't originally planning on adding this. But, I realized that we have different validations for PUT/DELETE than we do for GET. Namely, we don't support an empty string for PUT/DELETE. So the primary advantage is to check that the key is valid for the operation when getting the key, rather than waiting until we process it on an event.

…tAction is supported once.

Signed-off-by: David Venable <[email protected]>
@dlvenable
Copy link
Member Author

I created a small performance evaluation using rename_keys since this is a very lightweight processor. I updated rename_keys to use the new approach. And then added a rename_keys_old which uses the old approach.

Pipeline:

http-user-agent:
  workers: 1
  delay: 100
  source:
    http:
      max_request_length: 25mb

  buffer:
    bounded_blocking:
      buffer_size: 500000
      batch_size: 10000


  processor:
    - rename_keys_old:
        entries:
          - from_key: log
            to_key: old_log
    - rename_keys:
        entries:
          - from_key: old_log
            to_key: new_log

  sink:
    - stdout:
        tags_target_key: tags

Running for about 3 minutes, I found the old approach took about 45% of the CPU time and the new approach took 21% of CPU time.

The flame graph is available here:

performance-test-evaluation

Code reference:

dlvenable@961922e

@dlvenable dlvenable merged commit 52d2f0e into opensearch-project:main Jun 17, 2024
41 of 47 checks passed
@dlvenable dlvenable deleted the 1916-event-key branch June 19, 2024 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support an EventKey object
3 participants