-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] Configurable behaviour of recombine operator when entries don't match is_first_entry
#31653
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Previous behavior was intentional: open-telemetry/opentelemetry-log-collection#415 |
This request might sound weird in the general, abstract sense of how the The world is not as abstract and simple. As linked by @sumo-drosiek in the above comment, there is a specific scenario where different behavior might be desirable. Users that run the Otel collector in Kubernetes clusters and use the Filelog receiver with the I'm in favor of making this behavior configurable. It would be good to come up with a name for this configuration option that is descriptive, compelling and short enough at the same time. Brainstorming some proposals:
Happy to hear others' opinions. |
Apologies for overlooking open-telemetry/opentelemetry-log-collection#415. I was working to clean up this operator and couldn't find a rationale for the behavior. Reviewing the original issue, it seems the justification was basically to handle an error scenario more gracefully. However, I think this is an undesirable behavior in another case, where the file just happens to start with some non-matching lines (maybe was rotated partway through writing a multiline log) but still should be recombined up until the first match.
This is a clever use of the behavior but not one which the operator was ever intended to support. That said, I can appreciate how most of the parsers and transformers have an I'm generally in support of supporting all of the above use cases while making sure the default behavior is intuitive and safe. What do you think about the following proposal, @kkujawa-sumo, @sumo-drosiek, @astencel-sumo? Instead of a flag which switches between the previous and current behavior (i.e. all or nothing), we add a setting, called something like
|
Thanks @djaglowski for digging into this. Your proposal of One more question to pin down the behavior. Is the |
I was imagining it would apply repeatedly, until |
The proposition with If I understand correctly we will see following behavior of the case 1
output:
case 2
input:
output:
case 3
output:
|
When it reaches |
@h0cheung It depends, some users would like to use one pipeline for different formats of logs. |
You can assign me this issue ;) |
…ize to recombine operator (#32144) **Description:** Add a new `max_unmatched_batch_size` config parameter to configure the maximum number of consecutive entries that will be combined into a single entry before the match occurs **Link to tracking Issue:** #31653 **Testing:** unit tests, manual tests **Documentation:** Add description of the new config option
Closed by #32144 |
Reopened due to #32154 |
…ize to recombine operator (#32168) **Description:** Add a new max_unmatched_batch_size config parameter to configure the maximum number of consecutive entries that will be combined into a single entry before the match occurs **Link to tracking Issue:** #31653 **Testing:** unit tests, manual tests **Documentation:** Add description of the new config option Changes from #32144 with improvements in tests
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@djaglowski @kkujawa-sumo Can we close it? |
Yes, this can be closed. The new option was added in #32168 |
Component(s)
pkg/stanza
Is your feature request related to a problem? Please describe.
The change introduced in #30797 causes different behaviour when entries don't match
is_first_entry
, please see this line.The previous behaviour was convenient, user could have one pipeline for multiline logs and non-multiline logs, non-multiline were unchanged (with the consequences that in some circumstances multiline logs could be sent without recombiniation), now non-multiline logs are merged.
Describe the solution you'd like
Configurable behaviour of recombine operator when entries don't match
is_first_entry
, possible options:merged
,unchanged
.Describe alternatives you've considered
Two pipelines for logs collection, one for multiline logs, another for non-multiline logs but it will require some awareness about logs format and selection of the pipeline for log file.
Additional context
No response
The text was updated successfully, but these errors were encountered: