-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/ottl] setting higher contexts in OTTL can result in unexpected transformations #32080
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
At a glance, I believe the issue is caused because ottl operates on a hierarchical data model which groups multiple log records by their shared resource attributes. In my opinion, when setting a value on the resource that comes from an individual log, this should produce a new resource with the updated value and move the log record into that resource. An example with much unnecessary detail removed: # input
logs:
- resource:
attributes:
foo: bar
records:
- attributes:
some.key: left
body: Hello one
- attributes:
some.key: right
body: Hello two
# Current result if we run "set(resource.attributes["num"], attributes["some.key"])
logs:
- resource:
attributes:
foo: bar
num: right # was "left" for one iteration then was overwritten
records:
- attributes:
some.key: left
body: Hello one
- attributes:
some.key: right
body: Hello two
# Correct result (in my opinion)
logs:
- resource:
attributes:
foo: bar
num: left
records:
- attributes:
some.key: left
body: Hello one
- resource:
attributes:
foo: bar
num: right
records:
- attributes:
some.key: right
body: Hello two In other words, the fact that our |
Thank your for the explanation! Based on my understanding then, we should see many fewer issues such as these if we do not try to use ottl to modify resource attributes in processors. Would you expect that to workaround this behavior caused by pData structure grouping logs by resources? |
@djaglowski is correct. When using the logs context in the transformprocessor it will cycles through each log, executing each statement in order. When you use a value from a log to set a property in that log's scope or resource all the logs for that scope/resource will try to do the same unless you add some clever where clauses. As @djaglowski showed, if two logs in a resource have different values, the last (or first depending on Where clauses) will win - you cant get both values in the same resource attribute key unless you concat.
I'm not sure we want this to be the default behavior either. It is totally valid for all the spans/datapoints/log records to share an attribute and the user wants that attribute set on the resource. If we always created a new resource each time we set from a "lower" context then we can end up with duplicate resources across the slice. I also believe that if This issue has been known, but confusing/limiting, behavior since OTTL's inception. I think we'll need a more complex solution than to always split it out into its own Resource/Scope/Log_record combo. I see a couple options:
@mut3 since you're using the filelog receiver to collect this data we highly recommend doing as much work during the collection as you can. I recommend using the move operator to set the resource attributes instead of the transformprocessor since you have the opportunity to do it via the receiver. In general, the less components in your pipeline the better. If you must use the transformprocessor for this data, I believe you could hack together a solution like this if you know the exact keys of all your resource attributes:
I highly recommend using the filelog receiver instead of doing that approach lol |
/cc @evan-bradley |
Thank you! We moved all our Maybe there should be some warnings about modifying |
That's fair. What I should have said is
It's a known issue but I think it's fundamentally a valid one which should eventually be fixed. The hierarchical representation (grouping logs/metrics/traces by resource and scope) is a direct reflection of pdata but not necessarily prescribed by the project's data model. In my opinion OTTL should not be designed for pdata but rather around it. That is, when data is moved or copied between contexts, it should behave logically as if the lower level context has its own copy of the higher level context(s). Representing the end result is an implementation detail. Put another way, OTTL is treating resources and scopes a references, but users need to work with them as values. |
Hi. I am pretty sure I just got bit by this issue when trying to copy Span attributes into Resource attributes. There's more context in the Wikimedia bug tracker but the short version is:
I'll see if I can use the groupbyattributes processor to work around this. But, this is really confusing, and entirely undocumented. The transformprocessor documentation even suggests that you can do something like this, in the Contexts section of its README.md:
From an OTTL end user's perspective, that sentence is only true for very specific meanings of the words "associated" and "the" 😅 |
If all the spans in the resource don't have the same As for a change to OTTL's behavior, @djaglowski idea of I'm also open to documenting this behavior. Since it is a write-only problem, I believe it only affects the transformprocessor. |
Here we attempt to use the groupbyattrs processor to re-group traces by a span-level attribute `service.name`. This processor is designed to create a new top-level Resource when it is necessary, so hopefully this works around the issue if we are indeed encountering open-telemetry/opentelemetry-collector-contrib#32080 Bug: T363407 Change-Id: I43ab98cf02ed712fc087335315a57638019a15a0
I think there is a contradiction here. The design cannot consider modifications to a resource or scope to be "only within the TransformContext" of a log record when we know those changes are directly modifying the TransformContexts of other log records as well. |
I was implying that the framework is made to act on a singular TransformContext, inclusive of a resource/scope, so if a transformation needed to go find a different resource to act upon it wouldn't have it. I'm sure it is solvable, but I'd want to make sure that we don't lose:
If we can keep those things intact then we get the best of both sides: users can correctly transform fields from "higher" contexts in a fast and efficient way. It is possible some ideas from #30800 would help here. |
Here's a possible solution which would leave the data corruption issue in place but also provides a decent workaround for it. In short, we could add a new Alternatively, perhaps ExampleInputSay we have the following resources:
- attributes: { name: AAA }
records:
- attributes: { host: Red }
body: First record
- attributes: { host: Blue }
body: Second record
- attributes: { host: Red }
body: Third record
- attributes: { host: Blue }
body: Fouth record
- attributes: { name: BBBBBB }
records:
- attributes: { host: Red }
body: First record
- attributes: { host: Blue }
body: Second record
- attributes: { host: Red }
body: Third record
- attributes: { host: Blue }
body: Fouth record FlattenA resources:
- attributes: { name: AAA }
records:
- attributes: { host: Red }
body: First record
- attributes: { name: AAA }
records:
- attributes: { host: Blue }
body: Second record
- attributes: { name: AAA }
records:
- attributes: { host: Red }
body: Third record
- attributes: { name: AAA }
records:
- attributes: { host: Blue }
body: Fouth record
- attributes: { name: BBBBBB }
records:
- attributes: { host: Red }
body: First record
- attributes: { name: BBBBBB }
records:
- attributes: { host: Blue }
body: Second record
- attributes: { name: BBBBBB }
records:
- attributes: { host: Red }
body: Third record
- attributes: { name: BBBBBB }
records:
- attributes: { host: Blue }
body: Fouth record TransformThe user can then apply a transformation, e.g. moving a log record attribute to a resource attribute. resources:
- attributes: { name: AAA, host: Red }
records:
- attributes: {}
body: First record
- attributes: { name: AAA, host: Blue }
records:
- attributes: {}
body: Second record
- attributes: { name: AAA, host: Red }
records:
- attributes: {}
body: Third record
- attributes: { name: AAA, host: Blue }
records:
- attributes: {}
body: Fouth record
- attributes: { name: BBBBBB, host: Red }
records:
- attributes: {}
body: First record
- attributes: { name: BBBBBB, host: Blue }
records:
- attributes: {}
body: Second record
- attributes: { name: BBBBBB, host: Red }
records:
- attributes: {}
body: Third record
- attributes: { name: BBBBBB, host: Blue }
records:
- attributes: {}
body: Fouth record UnflattenThen an resources:
- attributes: { name: AAA, host: Red }
records:
- attributes: {}
body: First record
- attributes: {}
body: Third record
- attributes: { name: AAA, host: Blue }
records:
- attributes: {}
body: Second record
- attributes: {}
body: Fouth record
- attributes: { name: BBBBBB, host: Red }
records:
- attributes: {}
body: First record
- attributes: {}
body: Third record
- attributes: { name: BBBBBB, host: Blue }
records:
- attributes: {}
body: Second record
- attributes: {}
body: Fouth record Order & ReversibilitySince Additionally, for any transformation |
I see how this would solve the issue at hand, but I'd like to avoid the extra steps if necessary unless flattening/unflattening achieves a goal outside of moving log records to a new resource object when modifying the resource from the If we're looking at creating new contexts, could we create contexts that handle the flattening/unflattening/copying/etc. operations for us? We could split the current The There would need to be a lot of internal machinery to support these operations, and we would need to decide upon and document what comprises resource object's identity, but I think this would solve this issue in a way that's both backwards-compatible and feels native to OTTL. |
This is a first step towards a [Flatten/Unflatten workaround](#32080) for OTTL's data corruption issue, specifically this would support Unflatten. The workaround was discussed in a recent collector SIG and it sounded like it would be acceptable if available behind a feature gate and only for the transform processor. If this is accepted I'll work on a Flatten utility next, then integrate them into the tranform processor to prove end-to-end. Finally, I'll implement similar features for metrics and traces. This PR adds an internal utility package which simplifies grouping logs by resource and scope. I'm proposing this initially in `internal/pdatautil` but the functionality could eventually be merged into `pkg/pdatautil` if we find it useful elsewhere.
This PR proposes a feature gate which would enable the Flatten/Unflatten behavior described [here](#32080 (comment)). This was discussed in the Collector SIG recently and it was mentioned that a feature gate would be a reasonable way to implement this. One immediate question: Should this be purely a feature gate, or should there be a config option on the processor which fails `Validate` if the feature gate is not set? --------- Co-authored-by: Curtis Robert <[email protected]> Co-authored-by: Evan Bradley <[email protected]> Co-authored-by: Tyler Helmuth <[email protected]>
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
The For the sake of configuration simplicity, and compatibility with the current config, it would be nice if |
A related question here is exactly how flattening behaves for traces and metric. For traces, I could see a need for every span to have its own copy of resource & scope. Or, you may need every span event to have its own copy of resource, scope, and span. For metrics, would it ever be required to given each metric its own copy of resource & scope? Is it ever needed for each data point? Maybe there's a general rule that it is possible to flatten down to any context? |
Ya this is what I had in mind. Given And then like the current implementation, after transformations occur the data would be regrouped. For spanevent and datapoint this flattening would be pretty inefficient. But also it would be an advanced, opt-in situation. |
@TylerHelmuth, I have run into a few use cases lately that reminded me that
|
@djaglowski I think there has been enough requests for these features that I am open to more fine-grained control via functions. We also have the proper testing framework in place to ensure they work as expected. I am curious if Flatten and Unflatten should instead be editors that work directly on the data. The idea of using Flatten to set something in the cache or an attribute doesn't feel quite right. |
I agree with this. |
Removing |
Component(s)
pkg/ottl, pkg/stanza, processor/transform, receiver/filelog
What happened?
Description
We have a pipeline that uses a filelog input with some operators that set certain fields in attributes. We then have a transform processor that reads those fields to
set()
some Resource.attributes. The values that get set in the resource attributes of the log appear to be from a different log as if the transform is reading from one log and writing another.Steps to Reproduce
I do not currently have a MVC for this issue, but I will include a stripped and sanitized config that contains the core logic/entities around the bug we are seeing.
Configuration
Expected Result
Logs where
attributes["pod_name"] == resource.attributes["k8s.pod.name"]
andresource.attributes["k8s.namespace.name"] == attributes["namespace"]
This is the case for most of the logs emitted from our pipeline
Actual Result
For some small percentage of our logs, the values of
resource.attributes["k8s.container.name"]
resource.attributes["k8s.namespace.name"]
resource.attributes["k8s.pod.name"]
resource.attributes["k8s.container.restart_count"]
resource.attributes["k8s.pod.uid"]
Do not match their
attribute
counterparts, but instead appear to come from a different log event. The fields appear to be internally consistent, as if they all came from the same different log event. For example ifresource.attributes["k8s.pod.name"]
is wrong and holds the name of some other pod, theresource.attributes["k8s.namespace.name"]
will have the namespace of that other pod.Here is a sanitized example log json pulled from our elasticsearch
Sanitized incorrect log
Collector version
v0.96.0
Environment information
Environment
OS: Amazon Linux 2023 (Amazon's v1.25.16-eks ami)
Compiler(if manually compiled): Collector Helm Chart v0.84.0
OpenTelemetry Collector configuration
See Steps to Reproduce
Log output
No response
Additional context
We never experience this issue with these same moves to resources.attributes done via operators on the filelog receiver:
All operator solution
The text was updated successfully, but these errors were encountered: