-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log record mutations are visible in next registered processors #4065
Comments
I will check if it is possible to follow the pattern in Go even though it's pitfalls I outlined #4010 (comment). My main goal is to keep a possibility to have zero allocations when the log records are not modified (as this the the usual use case). If we succeed to find a satisfying design for Go (and Rust), we could update the specification to make this requirement explicit. CC @open-telemetry/rust-maintainers @open-telemetry/cpp-maintainers @open-telemetry/technical-committee |
If a log processor's mutations were not visible to the next processor, then there would be no (or very little) point of Other evidence that the "mutations are not visible to subsequent processors" is the wrong interpretation:
Wondering if you can elaborate on this. I can't seem to find the LogRecordProcessor interface definition in c++ (I don't know c++ so I'm very naive navigating the code). Do log record processors call subsequent processors in c++? Is it possible to write a log processor which extracts and includes baggage attributes on log records in c++? |
The modified record can be passed to a decorated processor. This is how mutations can be currently done in C++, Rust, and Go. E.g. Notice that currently there is nothing in the spec that allows creating a new/custom ReadableLogRecord.
Sure. Regarding C++ my information is based on #4010 (comment). |
There's no mention of a decorated processor in the spec, so it sounds like C++, Rust and Go invented new concepts. To be clear, I support an effort to evolve the spec to include accommodations which allow for higher performance implementations. However, I feel strongly that the current language of the spec does imply that log record mutations are visible to the subsequent processors. The design juggles several priorities, including performance and conceptual symmetry with prior art in otel (the trace signal in particular). A design featuring chaining or decorating processors might pave a clear path for a high performance implementation, it does so at the expense of conceptual symmetry. My rust, c++, and go skills are pretty poor, but there may be creative language-specific ways to keep the door open to zero allocations while also staying within the spirit of the spec. For example, a processor might have a method indicating whether or not it mutates data. If none of the processors mutate data, or if there is only one processor registered, then the SDK could use an alternative implementation of ReadWriteLogRecord which avoids allocations. |
Actually, it may occur that it won't be a problem for Go. I am in progress of evaluating it. I remember from my previous prototypes for most cases the Go escape analysis was doing good job. Reference https://github.com/open-telemetry/opentelemetry-go/blob/main/log/DESIGN.md#passing-record-as-pointer-to-loggeremit Rust Logs SDK is also not stable and there was a discussion whether to use the same pattern as for span processors. See #4010 (comment) It may be a good compromise to follow span processor pattern to achieve better consistency across languages as well as with tracing. |
For sake of transparency I want to share some thoughts based on feedback I received from @MrAlias. The specification does not require log records to be concurrent safe (notice that the spans are required to be concurrent safe by the spec here). What happens when a subsequent processor modifies a log record that was passed to a batch processor? Does the batch processor need to make a deep copy of processed records? How will we make sure that users will not modify the log record asynchronously? In Go, I thought about adding a comment. However, commenting something like this is not going to be a safe solution. There are going to be many user use cases that will violate this and having a comment is not an adequate design choice. We could make log record concurrent safe, but this would decrease the performance even further. |
From the spec). A processor which asynchronously modifies the ReadWriteLogRecord passed to LogRecordProcessor#onEmit is violating the spec and the behavior is undefined. In practice, some processors will need to make changes asynchronously. These implementations should make a local copy of the log record. I imagine an implementation could take stronger measures to prevent this by doing something to lock the log record after a log record processor has returned, which may be at the expense of performance (not sure because I haven't tried). This appears to be one of those areas of the spec where we have a possible foot gun. And personally, I'm fine with it. We should have a well paved path that works out of the box for most of the things that users want to do. But there will be advanced users who want to step off that path, and they should have the ability to do things and take responsibility for those things. Its the "we're all adults here" argument. |
A processor is synchronously modifying a record which is retained by another processor which retains is and only makes asynchronous reads. This could happen if e.g. one registers (e.g. by mistake) a "retracting processor" after a batch processor. I think that the specification should say something like:
Currently the spec makes it easy to have implementations with race conditions. |
I'm supportive of clarifying the language. But it wouldn't effectively change any implementation would it? At the end of the day, the user is still responsible for not making asynchronous changes to the record. This is just providing better language explaining "why". |
The user can make a synchronous change and another processor makes an asynchronous read. This leads to a race condition.
E.g. you are not making a clone of record here: https://github.com/open-telemetry/opentelemetry-java/blob/d9cef81e298f93b2379aa87d96edacf834fd7fcc/sdk/logs/src/main/java/io/opentelemetry/sdk/logs/export/BatchLogRecordProcessor.java#L92 and the next registered processor can make log record modifications that would lead to a race condition. |
Keep in mind that a user defined processor is not constrained by this specification. Saying they should or shouldn't do something is a meaningless statement. We can only make prescriptive statements about OTel processor behaviors. Having designs the syntactically restrict or promote good user use or experience is the way you influence 3rd-party use cases. |
Nah, we're always going to make that clone in otel java. We have different representations for SDK processors vs. exporters. But it could save a non-stable implementation from needing to do something like that. Although I would argue that an implementation doesn't need to make that distinction today - they can safely at the record to the batch processor queue. Any issues with the record being mutated at the same time as the exporter is serializing it is the fault of the processor which violated the spec. |
If we want consistency with tracing (over performance) for log processors then we should consider requiring the SDK's log record record to be concurrent safe. EDIT: For reference open-telemetry/opentelemetry-go#5478 |
Leaving some thoughts why I still think that requiring log record mutations to be visible in next registered processors may be a performance+design footgun and why it may be undesired in some languages.
We should answer what is more important. Performance or symmetry with tracing?
In my opinion, there can be application which emits "tons" of logs and performance is more important. Especially that having a similar design to tracing involves overhead that would not be possible to overcome. Some users are already complaining that tracing and metrics involves to many heap allocations (e.g. here) and adding allocations to logs would bring the problem to the next level. Reference: open-telemetry/opentelemetry-go#5478
More advanced users may want to squeeze as much performance as possible. From https://opentelemetry.io/community/mission/#we-value-_performance_:
I think that in Go ecosystem having a design which allows zero-heap allocation logging (especially in the hot path) is important. Reference: https://go.dev/blog/slog#performance A "decorator-based" design would be Go idiomatic as it would follow the design of the standard Go structures logging library: https://pkg.go.dev/log/slog#Handler. Creating pipelines for OTel Go Logs SDK and
This is how I believe it can be done in Go: open-telemetry/opentelemetry-go#5470
In Go, representing a record as interface (to give a possibility to have several implementation) would already decrease the performance (slower calls and more heap allocations; more; I was doing some benchmarks months ago but I hope you will just trust me). To sum up, I lean towards favoring logging performance and symmetry with structured logging package from the Go standard library over symmetry with OTel tracing design. |
SIG meeting summary We have a common agreement that for logs the performance is extremely important as we might expect more load than for e.g. tracing. It looks that we are leaning towards changing the spec so that SDKs can require chaining processors for making mutations. Processors are responsible for calling the next in the chain which allows the records to stay on the stack. This is also how C++, Rust, Go Logs SDKs are currently designed. |
I will be unavailable for about next 3 weeks. I try to make some PR once I am back. Thanks a lot for your feedback and support. |
## Why Fixes #4010 (read the issue for more details) The SDK SHOULD offer a way to make independent log processing. I find it as a missing feature that should be covered by by the specification. ## What This PR tries to achieve it in the following way: - SDK SHOULD provide an isolating processor allowing independent log processing pipelines. Source: #4010 (comment) This approach should be solving the issue at least for Java, .NET, Python, JavaScript, Go. This proposal should not make any existing log SDKs not compliant with the specification. I think that the same approach could be also adopted in tracing for span processors (but I have not yet investigated it in depth). ## Remarks I created a separate issue for specifying whether (by default) a change to a record made in a processor is propagated to next registered processor. See #4065
## Why Fixes open-telemetry#4010 (read the issue for more details) The SDK SHOULD offer a way to make independent log processing. I find it as a missing feature that should be covered by by the specification. ## What This PR tries to achieve it in the following way: - SDK SHOULD provide an isolating processor allowing independent log processing pipelines. Source: open-telemetry#4010 (comment) This approach should be solving the issue at least for Java, .NET, Python, JavaScript, Go. This proposal should not make any existing log SDKs not compliant with the specification. I think that the same approach could be also adopted in tracing for span processors (but I have not yet investigated it in depth). ## Remarks I created a separate issue for specifying whether (by default) a change to a record made in a processor is propagated to next registered processor. See open-telemetry#4065
…telemetry#4067) ### Current description Closes open-telemetry#4065
What are you trying to achieve?
Define if log record mutations done in record processor are visible in next registered processors.
From #4010 (comment):
Additional context.
The undocumented assumption was the log processor will follow the pattern that mutations will be visible to the next processor to be consistent with span processors. However, the C++ Logs SDK is not designed this way (log records are passed as a copy to subsequent registered processors). Other Logs SDKs follow the pattern or are not stable.
The text was updated successfully, but these errors were encountered: