Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluentbit/processor: Update the parser to reserve data while avoiding duplication of keys #653

Merged
merged 5 commits into from
Jun 9, 2022

Conversation

ridwanmsharif
Copy link
Contributor

@ridwanmsharif ridwanmsharif commented Jun 6, 2022

Fixes b/226372668.

This change does the following:

  • Adds a nest LUA component to nest all fields (except the parse key) under a temporary internal field
  • Parse the payload using the parse key while using Reserve_data set to "True"
  • Adds a merge LUA component to merge the parsed payload with the original. It will overwrite all fields except for the logName (which will be preserved from the original) and the labels which will be merged with the parsed payload. See details below. Then delete the temporary internal field.
  • The above is now applied to all parsers used in the Ops Agent
  • Update the mongodb parser to use the same logic as above
  • Integration test for the above changes

Logic for merging labels:

  • All labels in the original payload are kept as long as the parsed payload doesn't conflict with the same key
  • If a parsed payload has a label with the same key as the original, then it overwrites the original
  • All labels in the parsed payload are kept

See full results for the perf benchmarks here: https://screenshot.googleplex.com/5sZHUPnyjMvDXnz

To make sense of the results use the following legend: file -> file receiver with no parsing file_parse_raw -> file + parser file_parse -> file + parse with some fields that exist before parsing (added using another filter thats why there is some perf impact) file_parse_lua -> file + parse + some existing fields + 2 lua filters to deal with deduplication (using the slowest path: some original fields already present and needs merging)

@ridwanmsharif ridwanmsharif marked this pull request as ready for review June 6, 2022 21:03
@ridwanmsharif ridwanmsharif force-pushed the ridwanmsharif/parse_deduplication branch 8 times, most recently from 5332fc2 to d35d573 Compare June 7, 2022 03:59
confgenerator/fluentbit/parser_deduplication.go Outdated Show resolved Hide resolved
integration_test/ops_agent_test.go Show resolved Hide resolved
integration_test/ops_agent_test.go Outdated Show resolved Hide resolved
@ridwanmsharif ridwanmsharif force-pushed the ridwanmsharif/parse_deduplication branch 5 times, most recently from 8c28ae5 to 66d3720 Compare June 7, 2022 20:44
Copy link
Contributor

@qingling128 qingling128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ridwanmsharif ridwanmsharif force-pushed the ridwanmsharif/parse_deduplication branch from 66d3720 to 2cce6fb Compare June 8, 2022 04:16
@ridwanmsharif ridwanmsharif added the kokoro:force-run Forces kokoro to run integration tests on a CL label Jun 8, 2022
@ridwanmsharif ridwanmsharif force-pushed the ridwanmsharif/parse_deduplication branch from 2cce6fb to 87fdd5f Compare June 8, 2022 20:15
@stackdriver-instrumentation-release stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Jun 8, 2022
@ridwanmsharif ridwanmsharif added the kokoro:force-run Forces kokoro to run integration tests on a CL label Jun 9, 2022
@stackdriver-instrumentation-release stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Jun 9, 2022
@ridwanmsharif ridwanmsharif force-pushed the ridwanmsharif/parse_deduplication branch from 87fdd5f to 83a8adc Compare June 9, 2022 02:01
@ridwanmsharif ridwanmsharif added the kokoro:force-run Forces kokoro to run integration tests on a CL label Jun 9, 2022
@stackdriver-instrumentation-release stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Jun 9, 2022
@ridwanmsharif
Copy link
Contributor Author

The integration test added in the PR passes on Windows. The only failures are unrelated (and a known issue) and are being addressed in b/235115328. Merging this PR now.

@ridwanmsharif ridwanmsharif merged commit 7eb301b into master Jun 9, 2022
@ridwanmsharif ridwanmsharif deleted the ridwanmsharif/parse_deduplication branch June 9, 2022 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants