Add BeforeEnd to have a callback where the span is still writeable #1089

gugahoa · 2020-10-13T17:12:48Z

What are you trying to achieve?
Set an attribute to all spans in a trace

What did you expect to see?
Writeable spans on SpanProcessor's OnEnd(Span)

Additional context.
What I tried to implement is a feature called trace field in Honeycomb (reference: AddFieldToTrace)

For that I first tried to use SpanProcessor, but soon found out that OnEnd(Span) is not writeable.

The solution I arrived was to write a custom exporter which was able to read from a thread-local storage (at the time of the implementation, Baggage had yet to be spec'ed out) to list all attributes related to a Trace and insert into the span before exporting.

From what I understood from the SpanProcessor specification, that's the place where I should have placed this logic, as it's a transformation that should happen before the span is sent to any exporter

The text was updated successfully, but these errors were encountered:

tsloughter · 2020-10-13T21:15:43Z

/cc @arminru

garthk · 2020-10-13T21:20:20Z

Required for dynamic resource-like attributes desired in every span but computed at run-time because they're not constant, eg. instance uptime and recent CPU utilisation. Injecting them in the exporter gives results that become less accurate during the spikes we're most interested in.

arminru · 2020-10-14T07:54:23Z

IIRC it was discussed to make spans writeable in OnEnd but this was left out because users will not always have full control over the order in which span processors are registered and in turn invoked, which could result in unreliable behavior.
From your use case it sounds like you would not need any attributes that are added/overwritten in OnEnd visible to any other span processor, right? If you only need to have those changes reflected once passed to exporters, this shouldn't be an issue then.

@carlosalberto you seemed to remember some of that discussion as well in the meeting yesterday. Do you know the rationale for that decision?

iNikem · 2020-10-14T07:55:36Z

Why these attributes cannot be set in onStart, which already has writable span?

arminru · 2020-10-14T08:13:33Z

Reading the spec again, I noticed that it explicitly says that spans are already marked as ended at the time they are passed to SpanProcessor.OnEnd:

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#onendspan

OnEnd is called after a span is ended (i.e., the end timestamp is already set).

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#end

End: Implementations SHOULD ignore all subsequent calls to End and any other Span methods, i.e. the Span becomes non-recording by being ended (there might be exceptions when Tracer is streaming events and has no mutable state associated with the Span).

We would also have to change either of those requirements if we wanted to make spans writeable in OnEnd and we would need to understand the consequences well.

Oberon00 · 2020-10-14T09:25:18Z

Exactly. The difficult to get right (but specified!) order (exporting may happen before the other OnEnd processor) is another problem, but the main problem is that spans are already ended, as @arminru said. This was already discussed in #669 (comment).

There are two ways we could go forward here:

Use existing mechanisms in your implementation to modify the "SpanData", i.e. the readable representation of the span that the exporter gets, instead of the Span itself. EDIT: E.g. Java has https://github.com/open-telemetry/opentelemetry-java/blob/v0.9.1/sdk_extensions/tracing_incubator/src/main/java/io/opentelemetry/sdk/extensions/incubator/trace/data/SpanDataBuilder.java
We may introduce a new SpanProcessor hook BeforeEnd.

We definitely can't pass a writable span to OnEnd.

tsloughter · 2020-10-14T13:37:38Z

Why not define it as your Span Processor must assume it gets the Span in the form it is ended with and not expect to have any changes from previous processors visible?

tsloughter · 2020-10-14T13:44:12Z

@iNikem my understanding is that it is because OnStart can't be used to implement the Honeycomb feature https://godoc.org/github.com/honeycombio/beeline-go#AddFieldToTrace -- it would only be able to add the attribute to spans started after this call, but it is supposed to be able to add them to all local spans in the trace, even if already started.

I'd be fine with a BeforeEnd span processor, but don't really like adding more hooks when as far as I can tell the only issue is to define that OnEnd processors can't depend on each other and that if you have multiple processors writing the same attribute you can't know which will actually be added last -- so a warning to use unique attribute keys.

Oberon00 · 2020-10-14T13:50:35Z

No, the main issue is that in OnEnd the span is already ended, and for an ended span, SetAttribute should be a no-op. Changing that seems liable to opening a can of worms.

tsloughter · 2020-10-14T14:18:17Z

Ah, ok. Yea, I wouldn't want to change the fact that ended means read-only but instead move the "ending" to after the processors. But this isn't possible since one of the processors is what takes the "ended" span and passes it to the exporter... I didn't like exporting being part of processors because of this.

And probably too late to argue that sending to the exporter should be separate and unrelated to processors, leaving OnEnd processors to act like "BeforeEnd" :(

Oberon00 · 2020-10-14T14:35:22Z

Maybe we would even need a special state where the span is semi-ended, i.e. it has IsEnded==true and has an end timestamp, but is still recording.

iNikem · 2020-10-14T14:45:50Z

@iNikem my understanding is that it is because OnStart can't be used to implement the Honeycomb feature https://godoc.org/github.com/honeycombio/beeline-go#AddFieldToTrace -- it would only be able to add the attribute to spans started after this call, but it is supposed to be able to add them to all local spans in the trace, even if already started.

I don't understand. If you have a SpanProcessor registered, it will receive all spans created in the given process. And so it will be able to add attributes to all spans. You don't "do a call" manually. All spans are handed to you.

tsloughter · 2020-10-14T14:59:12Z

@iNikem it only receives a write-able span for OnStart, meaning if spans are already started in the trace you will not be handed those spans at a point that you have the attributes and can write to the span.

iNikem · 2020-10-14T15:01:56Z

It seems I still don't understand your use-case: what are these attributes that are not available right away but become available only when some spans were already started?

And even more: how does onEnd differ from onStart in this regard? I fail to see how you hoped to achieve your goal with onEnd that you cannot do in onStart?

tsloughter · 2020-10-14T15:05:55Z

It seems I still don't understand your use-case: what are these attributes that are not available right away but become available only when some spans were already started?

An HTTP user id parsed out of headers is the best example I can think of, but actual honeycomb users (@garthk @gugahoa ?) probably have more. The initial request span is started right away, before user code parses out headers they care about, a child span is started when the user's logic is called at which point they parse out the user-id and want to put it on all spans in that trace but the span created when the HTTP handler got the request has already run OnStart so the only time it could be added is OnEnd -- without manually going through the stack and adding the attribute.

seh · 2020-10-14T15:22:03Z

That's a good characterization, @tsloughter. This has been a "normal" way of using Honeycomb's Beeline library for a long time. It buffers the trace span tree in memory, and can distribute attributes (called "fields" there) up and down the span tree before sending the spans to the server. It stores a set of fields on the trace itself, in a field on the trace.Trace type called "traceLevelFields." You can see the transcription from that field down to each span just before sending each here.

bogdandrutu · 2020-10-14T16:04:50Z

I am curios why do you need to still add events/attributes to the Span when you can do that directly on the SpanData that we will export? What is the advantage to change the Span vs modifying the data that will be exported directly?

Oberon00 · 2020-10-14T16:06:24Z

Agree with @bogdandrutu. I'm not sure which implemenation language you are using but in Java there is https://github.com/open-telemetry/opentelemetry-java/blob/v0.9.1/sdk_extensions/tracing_incubator/src/main/java/io/opentelemetry/sdk/extensions/incubator/trace/data/SpanDataBuilder.java

Unfortunately, #158 is not resolved so this is all a bit vague in the spec.

tsloughter · 2020-10-14T16:21:23Z

"that we will export"

Where is there a pluggable place to modify SpanData before export?

If you are saying OnEnd Processor can modify SpanData then that would be fine.

iNikem · 2020-10-14T16:25:36Z

If you are saying OnEnd Processor can modify SpanData then that would be fine.

Processor can certainly modify SpanData

Oberon00 · 2020-10-14T16:27:47Z

@tsloughter

Where is there a pluggable place to modify SpanData before export?

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

Oberon00 · 2020-10-14T16:29:11Z

@iNikem

Processor can certainly modify SpanData

True, but in order for the "modification" (SpanData is immutable at least in Java, so it actually creates an updated copy) to be picked up by an exporter, that exporter must be invoked in the same OnEnd.

gugahoa · 2020-10-14T16:36:00Z

@bogdandrutu

I am curios why do you need to still add events/attributes to the Span when you can do that directly on the SpanData that we will export? What is the advantage to change the Span vs modifying the data that will be exported directly?

That would mean writing my own processor which will export to the configured exporters, but I wouldn't want to write my own processor for that and instead it would be preferable to rely on tried-and-true processors such as the BatchProcessor

@Oberon00

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

I'm not sure this translates well to Erlang, but I need to try it out first to see how it goes

tsloughter · 2020-10-14T16:36:14Z

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

That is what we hoped not to have to do and I believe is already the way people have been doing it.

But I guess if its a wrapper that can take any exporter it and be a new exporter that acts on the attributes and then calls the wrapped exporter with the modified batch it wouldn't be too bad.

It would certainly be more efficient to update "on end", but it sounds like a wrapper exporter is the way to go -- assuming there aren't issues with the lifetime of the shared attributes on a trace, will have to talk to those using it.

tsloughter · 2020-10-14T16:38:37Z

I'm not sure this translates well to Erlang, but I need to try it out first to see how it goes

It does. We have an exporter behaviour, otel_exporter. You can create a new exporter that simply goes through all the spans and instead of exporting them it updates their attributes in the table and then calls the export/2 function of the wrapped exporter.

And has to call the init/1 of the wrapped exporter when its init/1 is called at startup.

jack-berg · 2024-04-09T14:50:22Z

Reading through the comments, the arguments not to do this seem to be:

Its already possible to do this in a span exporter
Doing this in a span processor requires paying attention to span processor ordering

But asking users to do this in span exporter is inconvenient and unintuitive - this is clearly a "processing" type of task so why would a user expect to have to do it in an exporter?

And yes, by doing this in a processor you have to pay attention to processor registration order, but that's expected and matches common sense expectations for processors. Notice that nobody expects to ignore processor ordering configuration in the collector. File configuration makes the registration order of processors easy to configure.

This is the type of thing that is easy to do and fills an obvious conceptual flaw in the SDK design, where we've defined processors which can only process in a very constrained manner. Let's just fix it with the obvious / simple solution, which is to add a new optional beforeEnd method to SpanProcessor.

tsloughter · 2024-04-09T15:36:46Z

@jack-berg the end result I think was the replace the decorators feature` that was never defined.

jack-berg · 2024-04-09T15:38:31Z

Why introduce a new concept / SDK extension point instead of enhancing an existing? Limiting SDK extension points reduces cognitive load for users.

tsloughter · 2024-04-09T15:46:35Z

@jack-berg to make it possible to do similar to what this wants to do in OnStart as well.

tsloughter · 2024-04-09T15:47:49Z

Or no, that wasn't the reason... I swear there was a good reason though! hah

pellared · 2024-04-09T16:01:14Z

Loose naming idea:
In .NET there is a pattern to name callbacks like OnEnd (called just after End) and OnEnding (called just before End). We could name it OnEnding (instead of BeforeEnd).

Making a writeable spans on SpanProcessor's OnEnd(Span) for Go will be breaking change as OnEnd accepts a ReadOnlySpan interface. Reference: https://pkg.go.dev/go.opentelemetry.io/otel/sdk/trace#SpanProcessor

jack-berg · 2024-04-09T16:09:28Z

Discussed in the 4/9/2024 Spec SIG meeting. There was agreement that we should look into solutions to accommodate a variety of use cases that require modifying spans near the end, including:

Capturing expensive stack traces only if some condition is true
Removing unwanted attributes that were added by instrumentation
Schema transformation

Two broad families of solutions were suggested:

Modify the span processor, e.g.
- Add new beforeEnd operation where the span is still mutable
- Evolve the onEnd operation to allow span to be mutated
Codify the notion of a decorator, where the SDK provides tooling or an additional extension point to allow span processors to decorate / modify the ReadableSpan given the SpanProcessor#onEnd

@jackshirazi / @JonasKunz are either (or anyone else) interested / able to own this?

JonasKunz · 2024-04-11T07:18:04Z

I'm happy to own this.

jack-berg · 2024-04-16T20:42:06Z

Hi @JonasKunz - you've been assigned to the issue. Please reach out to me if needed. Thanks!

JonasKunz · 2024-04-17T13:40:10Z

I'd like to share my thoughts on the proposed families of solutions.

Modify the span processor, e.g.

Add new beforeEnd operation where the span is still mutable

Evolve the onEnd operation to allow span to be mutated

I think evolving the existing onEnd callback isn't the best solution here, because it technically is a breaking change:
Currently, span processors can assume that the span they are operating on in onEnd does not change during this call.
If the span becomes mutable, this is not the case anymore.

For the beforeEnd (or as suggested onEnding) callback this shouldn't be a problem in terms of backwards compatibility.
However implementors of beforeEnd would need to be aware of the fact that the span they are operating on might be changing from another thread, because in contrast to onStart the span has already escaped into the wild.

Here is a draft implementation showing how this new callback could be implemented in Java.

Codify the notion of a decorator, where the SDK provides tooling or an additional extension point to allow span processors to decorate / modify the ReadableSpan given the SpanProcessor#onEnd

IIRC the main concern here from @jack-berg was that this introduction of a new component type might introduce too much complexity for SDK users and I share that concern.

However, having a pipeline-like structure where one stage can pass a wrapped, immutable span to the next stage is a lot more flexible than just the beforeEnd callback:

It allows to filter spans
- Example use-case: We only want to export database spans which have a latency above a given threshold
- Can be implemented by not invoking the next pipeline stage
- Currently can be achieved by wrapping exporters, but this invalidates the batching performed previously in the BatchingSpanProcessor
Buffer spans and export them after some delayed notification:
- First example use-case: Adding cloud provider resource attributes (e.g. AWS). These resource attributes require costly operations (e.g. network calls) to be discovered. This currently requires a tradeoff: Either not add those attributes or delay the application startup until the costly operation has finished. The capability to buffer spans would open up a third option: Buffer the spans in a span processor, apply the new resource attributes when they are discovered and only after that pass the spans to the next stage
- Second example use-case: Correlating spans with external profiling data. In our distro we add the ids of stack-traces recorded by the eBPF profiler to spans. This data arrives with a slight delay, so the spans need to be delayed and not immediately sent too.

I think we can get this flexibility without introducing an entirely new component and therefore with a lot less complexity.
Here is a backwards-compatible implementation showing how this could be done at least in Java.

The idea is to add a new parameter to onEnd: This parameter is a function which needs to be called with a span as argument. This function will invoke the next span processor in the "pipeline". This is a well known pattern, e.g. from Java Servlet Filters.

At least in Java, this functionality is very easy and cleanly addable in a backwards compatible way as shown by my implementation. However, I don't know if this is also the case for the other languages.

An alternative way to achieve the same capability would be to add a new boolean initOnEndChaining(Function nextStage) function to SpanProcessor. This function would be invoked by the SDK at initialization time in order to detect whether the SpanProcessor operates in chaining mode (it will call the next processor in line itself) or not (the SDK will be responsible for invoking the span processor after this processor).

This alternative solution is not quiet as elegant as my provided Java solution with the onEnd, but I think it should work in all languages.

It would be great to get your feedback here! As you might have noticed, I would prefer the decoration/chaining approach over beforeEnd, but by enhancing SpanProcessors to do it instead of a new component.

jack-berg · 2024-04-22T22:51:04Z

I think evolving the existing onEnd callback isn't the best solution here, because it technically is a breaking change:
Currently, span processors can assume that the span they are operating on in onEnd does not change during this call.
If the span becomes mutable, this is not the case anymore.

I agree but, but would also add that whether or not the onEnd method can be evolved to accept a new mutable span type as an argument will vary based what is allowed in a language and how it is implemented. I think in practice several languages will have to do something quirky to accommodate a change like this, like introduce a new onEnd method with a slightly different name, or a new SpanProcessor interface with a different name, or wait for a major version bump of the SDK.

However, having a pipeline-like structure where one stage can pass a wrapped, immutable span to the next stage is a lot more flexible than just the beforeEnd callback:

The idea that a processor isn't able to generically mutate AND filter data makes it feel misnamed. With that said, the concept of a sampler at least partially overlaps with the filter use case. I wonder if its better to keep things separate, and instead of introducing a filter-chain concept in SpanProcessor, extend Sampler so there is a way to filter spans out on span end. Obviously any filtering that is done after the start of a span introduces the potential for broken spans, since downstream spans make sampling decisions based on the sampling status of their parent, which no longer reflects whether the parent span actually made it to the backend. Still, this use case comes up enough that its worth considering supporting and warning users to make sure they know the consequences.

Can anyone involved in the original trace SDK recall / comment on the reasoning why SpanProcessor wasn't initially designed with chaining in mind? The fact that it wasn't and therefore prevents the SpanProcessor filter use case seems deliberate rather than an oversight.

…4024) Fixes open-telemetry#1089. In addition to the comments on the issue, this was discussed in the spec SIG Meeting on 2024/23/04: * The filtering use-case explained in [this comment](open-telemetry#1089 (comment)) should rather be solved by the upcoming samplerV2 instead of `SpanProcessor`s due to better conceptual fit * The buffering use-case also explained in [this comment](open-telemetry#1089 (comment)) seems to be not relevant enough to influence the design decision * Apparently there was a discussion around building the `SpanProcessor`s in a chaining fashion during the initial SDK spec design and it was actively decided against it. However, no one could recall the reason why.

gugahoa added the spec:trace Related to the specification/trace directory label Oct 13, 2020

carlosalberto added the area:sdk Related to the SDK label Oct 13, 2020

arminru added area:api Cross language API specification issue enhancement New feature or request and removed area:api Cross language API specification issue labels Oct 14, 2020

dyladan assigned JonasKunz and jack-berg Apr 16, 2024

dyladan added triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned and removed triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor labels Apr 16, 2024

trask added this to 🔭 Main Backlog Apr 16, 2024

trask moved this to Spec - Accepted in 🔭 Main Backlog Apr 16, 2024

trask moved this from Spec - Accepted to Spec - In Progress in 🔭 Main Backlog Apr 16, 2024

JonasKunz mentioned this issue Apr 30, 2024

Add ability for SpanProcessor to mutate spans on end #4024

Merged

5 tasks

lmolkova mentioned this issue May 8, 2024

HTTP Client spans - Span Status for 4xx errors - Change from MUST to SHOULD open-telemetry/semantic-conventions#1003

Closed

vastin mentioned this issue May 24, 2024

Implemented Plugin.cs to enable Application Singals Instrumentation. aws-observability/aws-otel-dotnet-instrumentation#20

Merged

jmacd mentioned this issue May 29, 2024

Add isolating log record processor #4062

Merged

svrnm mentioned this issue Jul 8, 2024

Clarify how Application Developers are supposed to set the Span status when using Instrumentation Libraries #4131

Open

jack-berg mentioned this issue Jul 10, 2024

Log record mutations are visible in next registered processors #4067

Merged

carlosalberto closed this as completed in #4024 Jul 25, 2024

carlosalberto closed this as completed in c55650e Jul 25, 2024

danielgblanco moved this from Spec - In Progress to Completed Projects in 🔭 Main Backlog Jul 29, 2024

danielgblanco moved this from Completed Projects to Spec - Closed in 🔭 Main Backlog Jul 29, 2024

Add BeforeEnd to have a callback where the span is still writeable #1089

Add BeforeEnd to have a callback where the span is still writeable #1089

Comments

gugahoa commented Oct 13, 2020 • edited Loading

tsloughter commented Oct 13, 2020

garthk commented Oct 13, 2020

arminru commented Oct 14, 2020

iNikem commented Oct 14, 2020

arminru commented Oct 14, 2020

Oberon00 commented Oct 14, 2020 • edited Loading

tsloughter commented Oct 14, 2020

tsloughter commented Oct 14, 2020

Oberon00 commented Oct 14, 2020

tsloughter commented Oct 14, 2020

Oberon00 commented Oct 14, 2020

iNikem commented Oct 14, 2020

tsloughter commented Oct 14, 2020

iNikem commented Oct 14, 2020

tsloughter commented Oct 14, 2020

seh commented Oct 14, 2020

bogdandrutu commented Oct 14, 2020

Oberon00 commented Oct 14, 2020

tsloughter commented Oct 14, 2020

iNikem commented Oct 14, 2020

Oberon00 commented Oct 14, 2020 • edited Loading

Oberon00 commented Oct 14, 2020

gugahoa commented Oct 14, 2020

tsloughter commented Oct 14, 2020

tsloughter commented Oct 14, 2020

jack-berg commented Apr 9, 2024

tsloughter commented Apr 9, 2024

jack-berg commented Apr 9, 2024

tsloughter commented Apr 9, 2024

tsloughter commented Apr 9, 2024

pellared commented Apr 9, 2024 • edited Loading

jack-berg commented Apr 9, 2024

JonasKunz commented Apr 11, 2024

jack-berg commented Apr 16, 2024

JonasKunz commented Apr 17, 2024 • edited Loading

jack-berg commented Apr 22, 2024

gugahoa commented Oct 13, 2020 •

edited

Loading

Oberon00 commented Oct 14, 2020 •

edited

Loading

Oberon00 commented Oct 14, 2020 •

edited

Loading

pellared commented Apr 9, 2024 •

edited

Loading

JonasKunz commented Apr 17, 2024 •

edited

Loading