Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue a span (client/server Span sharing; concept supported by Zipkin) #1007

Closed
marcingrzejszczak opened this issue Sep 25, 2020 · 27 comments
Labels
area:api Cross language API specification issue area:span-relationships Related to span relationships release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory triage:rejected:declined

Comments

@marcingrzejszczak
Copy link

Following the discussion during the recent OpenTelemetry meeting I was supposed to file an issue about an option for a span to be continued. So that instead of always creating a new child span, we can actually continue with an exiting one.

@marcingrzejszczak marcingrzejszczak added the spec:trace Related to the specification/trace directory label Sep 25, 2020
@iNikem
Copy link
Contributor

iNikem commented Sep 25, 2020

How does "continuing the span" differ from "not closing the span"?

@Oberon00 Oberon00 added the area:api Cross language API specification issue label Sep 25, 2020
@Oberon00
Copy link
Member

Can you show an example of a case where you would like to "continue" a span and both not closing it or starting a child span of it (which, BTW, is also possible after ending it) do not work?

@Oberon00 Oberon00 added the needs discussion Need more information before all suitable labels can be applied label Sep 25, 2020
@Oberon00
Copy link
Member

Oberon00 commented Sep 25, 2020

This was already discussed, and it was decided not to support this, see #359 (and #367)

@carlosalberto
Copy link
Contributor

@marcingrzejszczak Besides what @Oberon00 mentioned, I think it makes more sense to add this in the Zipkin exporter (to properly support the merge-able Spans concept of Zipkin, that is). This also would need to be specified of course ;)

@andrewhsu andrewhsu added the release:after-ga Not required before GA release, and not going to work on before GA label Sep 25, 2020
@iNikem
Copy link
Contributor

iNikem commented Sep 27, 2020

@Oberon00 @carlosalberto do we want to keep this ticket open? If we have already discussed this and decided against it, why not close it?

@Oberon00
Copy link
Member

@marcingrzejszczak Please re-open if there is anything else to discuss / any other argument or use case.

@marcingrzejszczak
Copy link
Author

This was already discussed, and it was decided not to support this, see #359 (and #367)

I understand that it was discussed, however Sleuth & Brave users are a substantial number of users and I understand that we can come back to the discussion table to reason about this yet again. As I've seen here #367 the voice of Zipkin maintainers seemed to be completely ignored.

This is an important point in deciding on whether Sleuth should implement OTel. If the argument there is that "it has already been discussed" and the issue gets closed then it will be difficult to come to any agreement here. Like I mentioned, Sleuth can consider implementing OTel only if the compatibility for the users remains unbroken and from the usage perspective things seem to work in relatively the same way.

@Oberon00 Oberon00 reopened this Sep 28, 2020
@Oberon00 Oberon00 changed the title Continue a span Continue a span (client/server Span sharing; concept supported by Zipkin) Sep 28, 2020
@Oberon00
Copy link
Member

the voice of Zipkin maintainers seemed to be completely ignored

Can you point to a comment by a Zipkin maintainer? Or do you mean that we did not consult with them?

@Oberon00 Oberon00 removed the needs discussion Need more information before all suitable labels can be applied label Sep 28, 2020
@marcingrzejszczak
Copy link
Author

image

@Oberon00
Copy link
Member

^ This is #367 (comment) by @adriancole.

@anuraaga
Copy link
Contributor

I think it makes more sense to add this in the Zipkin exporter

Just about the technical aspect, I'm not sure how to implement this in the exporter, at least in a complete way. I can imagine putting the remote span ID in trace state and replacing in the exporter, but any logic in the app, most worryingly logs injection, will either not work or also have to be aware of this conversion. This is similar to #896 - when it comes to IDs, I don't think there is any complete solution that does not happen at the ID generation step itself.

I think it wouldn't be a breaking change to add a shared tracing config post-GA in general. But to unblock sleuth, it makes sense to me to expedite it for Java as a first step.

@tsloughter
Copy link
Member

Doesn't this require passing the parent id along with the span id when propagating? This isn't supposed by the W3C TraceContext, right? I'd rather not have to support a feature that only exists in B3 headers. Has this feature been discussed in the TraceContext SIG as well and for some reason decided against?

@codefromthecrypt
Copy link

"only exists in B3 headers" represents the bulk of traced traffic :)

@codefromthecrypt
Copy link

Incidentally, it is system specific the impact of not propagating the parent ID when sharing a span. However, it is best experience to do it https://github.com/openzipkin/b3-propagation/blob/master/README.md#why-is-parentspanid-propagated

I think this issue arose because @bogdandrutu and @marcingrzejszczak are trying to change sleuth so that it can seamlessly use open telemetry instead of brave. Some users even add parentID to their logging contexts. By deciding to not support this, it would hurt either adoption or the experience of people trying it out. up to you!

@anuraaga
Copy link
Contributor

anuraaga commented Oct 3, 2020

Doesn't this require passing the parent id along with the span id when propagating? This isn't supposed by the W3C TraceContext, right?

I think this is the sort of information that can go in TraceState for w3c, backend-specific propgated state. So it should work ok for b3 and w3c but not others. I think it's safe to say Zipkin users will be using one of these two so I wouldn't worry about it from a propagation perspective.

@anuraaga
Copy link
Contributor

anuraaga commented Oct 3, 2020

Going to comment on this from the other issue here to prevent two threads for what I think is the same topic.

#1004 (comment)

I interpret this to mean if two spans share a trace id, then they must have separate span ids. It's possible I'm reading into this too deeply. Although, I could see this being a requirement for some tracing backends. It wouldn't be easy to work around duplicate span ids for the same trace unless a backend has explicit support for it.

Yeah - we definitely wouldn't want to default to only sharing span IDs it would need to be an option. All the Zipkin instrumentation have an option to disable the sharing behavior for cases where the traces are exported to other backends like Stackdriver because as you point out, many backends would not be able to deal with it.

But since many users do rely on this behavior, and if I'm not mistaken, having an option for this in tracing config is not difficult to implement (if true, don't generate a new ID), I want to consider this if it helps existing users, like Sleuth's users.

@Oberon00
Copy link
Member

Oberon00 commented Oct 3, 2020

Could this be solved by #896? Like if you could install an span ID generator that gets as input the same as the sampler (actually span ID generators could be implemented as an optional sampler result part), it could decide that if parent.IsRemote && toBeCreated.Kind == Server then reuse parent span ID.

@bogdandrutu
Copy link
Member

I am willing to understand the impact that this decision to not support sharing the same span id has to the current Zipkin/B3 users:

  • If I understand correctly B3 propagation allows to not send the parent span ID. So we are compatible with that, but there may be hidden impacts that are not clear to me. The link https://github.com/openzipkin/b3-propagation/blob/master/README.md#why-is-parentspanid-propagated explains the behavior of sharing the same span id not the benefit of this, also the link may need to be updated because B3 supports no parent header.
  • On the Zipkin export protocol, there is support to send or not spans that share the same span id. So here we are also compatible, but there may also be hidden impacts that are not clear. For example there may be backends that don't support this, do we know any that does not support this option?

My preference is if this is not a hard requirement (we can confirm that we are breaking someone) to not support this, because as a new project supporting less edge cases/ system specific requirements makes our life easier in the future.

@tsloughter
Copy link
Member

I'm sure this question has been asked a lot but is there material I can read that explains the benefit of using the same span id? Even better if there is also some material on why other implementations of tracing did not do this?

The explanation under "why-is-parentspanid-propagated" says it is so the span can be put in the correct spot in the trace tree but it doesn't give an example or explanation of how this happens when using the client side as the parent.

But all that is because I'm curious, it isn't important to deciding this issue as far as I can tell. Since if neither side (B3 and W3C/OTel) intends to change then the issue is to ensure OTel spec does not somehow block this feature from being implemented.

Using tracestate and an alternate ID generator would only work if the generators were passed the extracted context. So I think an alternate StartSpan would be required, in an alternate Tracer SDK implementation and the API need only not define a restriction that the StartSpan function must return a span id that isn't the same as the remote parent.

@bogdandrutu
Copy link
Member

Also @marcingrzejszczak we need to look at this issue from 2 different angles:

  1. If user uses OTel API + OTel SDK (implementation). I think in this case is fine to not support this.
  2. If user uses OTel API + Brave implementation of the OTel API. We should try to support this if necessary. I will focus on this for Sleuth initially because that will give the backwards compatible story for you. Am I correct?

@Oberon00
Copy link
Member

Oberon00 commented Oct 3, 2020

I think an alternate StartSpan would be required

Why? The current StartSpan already has the parent information available before it has to generate an ID.

@Oberon00 Oberon00 added the area:span-relationships Related to span relationships label Oct 6, 2020
@kowshikn
Copy link

kowshikn commented Jan 8, 2021

I see this issue is still open. Can OpenTelemetry provide the guidance for Zipkin/B3 propagation?

@ericmustin
Copy link
Contributor

Echoing #1007 (comment)

I'm experiencing issues representing traces with duplicate span ids well, when using the zipkin receiver to receive traces from legacy zipkin b3 instrumentation (managed envoy).

It would be good to understand where the state of things are on this issue, or, given the current priorities around 1.0, when we might expect discussion here to continue (or whether this issue should be closed).

From a vendor perspective, it would be helpful to know whether there's any work for the collector that was being considered to handle duplicate spans, or whether we should try to manage this with exporter specific (dropping or merging client/server spans within the same batch that contain duplicate span ids).

@anuraaga
Copy link
Contributor

anuraaga commented Mar 9, 2021

@ericmustin Zipkin instrumentation has had an option to disable shared trace IDs for a long time, mostly added to support backends that don't support them. If you're using a zipkin receiver with a backend that doesn't support shared IDs, I think you'd have to flip this flag in the instrumentation. I mentioned a bit but since IDs are propagated, they generally are not something that can be handled in exporters / collector.

This issue is mostly about whether OpenTelemetry instrumentation should support shared trace IDs, mostly for compatibility when Zipkin is used as a backend, not the other way around I think.

@ericmustin
Copy link
Contributor

I see. Thanks @anuraaga. In this case it's the zipkin envoy tracing extension https://github.com/envoyproxy/envoy/tree/main/source/extensions/tracers/zipkin, which does have a shared_span_context flag but unfortunately this isn't exposed by the service mesh tool configuring it (kuma)

I think the reasonable approach here for the use case I'm seeing is to file an issue with Kuma, it does seem like this being on by default in zipkin will just continue to impact Otel users as they migrate from zipkin systems

@austinlparker
Copy link
Member

OpenTelemetry does not plan to extend our current model to support shared span IDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue area:span-relationships Related to span relationships release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory triage:rejected:declined
Projects
None yet
Development

No branches or pull requests