Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representing an asynchronous span in Zipkin #1189

Closed
tennenbaum opened this issue Jul 17, 2016 · 12 comments
Closed

Representing an asynchronous span in Zipkin #1189

tennenbaum opened this issue Jul 17, 2016 · 12 comments
Labels
enhancement model Modeling of traces

Comments

@tennenbaum
Copy link

For instance, if we want a span to represent the latency between producing a message to a queue and consuming it from the queue (e.g. if Kafka is the queue). In this case the producer call will finish prior to the start of the consumer call. We would like to represent the duration between producing and consuming.

We could use the core RPC annotations in this case to the represent the producer call (cs at the start, cr at the end) and consumer call (sr at the start, ss at the end). However, as the client annotations will both occur before the server annotations, looking at the Zipkin code (in particular zipkin.internal.CorrectForClockSkew#apply) this will cause Zipkin to detect clock skew where there is none.

Is there a better way of representing this type of call using the Zipkin data model?

@eirslett
Copy link
Contributor

The Zipkin data model doesn't support async spans like tracing kafka messages. It would need to be redesigned. (There's some work going on about Zipkin data model v2 I think)

@eirslett
Copy link
Contributor

One thing you could do, to represent it, is to start a new trace, with a new trace id, on the consumer side, and then add a binary annotation with an ID that correlates the two traces. But we have no UI support or special tooling support for that yet.

@tennenbaum
Copy link
Author

tennenbaum commented Jul 17, 2016

Thanks, I will look at the v2 discussion. I would like to include the asynchronous span in the same trace. An alternative approach I can see (for v1 code) is to submit non-core (custom) annotations when consuming and producing (using the same spanId). It appears that Zipkin would not attempt to adjust for clock skew where the core annotations are not present. It also seems that it would derive the duration of the span from the difference between the newest and oldest timestamps of the custom annotations. I am not sure whether this will play nice with the rest of the tooling though (very new to this code base). Any thoughts on that?

Note: Implied in the above is that we are sending the in-band tracing information (e.g. spanId) as metadata in the queued message.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jul 18, 2016 via email

@codefromthecrypt
Copy link
Member

@AndrewWang996 had a question about this in Brave. Basically how do you deal with a one-way system. He was concerned about half-open spans ("cs" and "sr" only), although I don't know what the specific issue with that was. Guessing dependencies view? or maybe lack of duration?

  • Zipkin doesn't care about these annotations except for a few scenarios
    • clock skew correction (ex receiver happens before send)
    • duration calculation (only relevant when Span.timestamp and Span.duration aren't set)
    • dependencies view (to show the link between services

I mentioned similar things to here..

  • You can treat the act of sending the message as "cs", "cr" and the act of receiving as "sr", "ss". Ex. close the span after you've sent the message (ignoring that there was no response received).
  • You can close the span with something else (ex Brave's LocalTracer), which adds span duration without using RPC annotations. You should add "ca" (producer) and "sa" (consumer) on the span (to ensure the dependencies graph can show things).

Things we could do here is investigate clock skew when no RPC spans exist in a trace. That could be bumped out as a separate issue, but I think it would only work if "ca" and "sa" are logged.

@AndrewWang996
Copy link

Measuring the latency between producing to and consuming from Kafka queues is precisely what I'm trying to do, although unlike @tennenbaum, I only considered adding the "cs" annotation upon produce and "sr" upon consume without worrying about how long it takes the message to be produced and consumed.

My problem is that I was attempting to use the ClientRequestInterceptor + ServerRequestInterceptor mentioned in Brave's 3.0.0 api, but it seems that this only submits the span if either ["cs", "cr"] or ["sr", "ss"] are handled in the Interceptor.handle(Adapter) methods. I only needed to handle "cs" and "sr" with the ClientRequestInterceptor and ServerRequestInterceptor, but in order for Brave to submit the spans, I needed to make a dummy adapter and submit "ss" as well, not to close the span, but just so that Brave knew to submit it. This is suboptimal. I'm not even worrying about dependencies or duration.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jul 25, 2016 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jul 26, 2016

ps spent all my battery on this topic on the flight :)

so I think there are a few ways to skin this cat. I tried a couple
with Kafka (stuffing SpanId into the message key, and using callbacks
to close the producer span)

  • represent sending to the broker as a cs/cr and receiving from the
    broker as sr/ss (in the same span)
    • time in transport is the interval between cr and sr (noting you
      can also log "ws" "wr" to show wire send attempts)
    • plus is that you can do this with all instrumentation
    • minus is the "ss" and "cr" parts are forged (ex you are really
      capturing the overhead of send and receipt)
  • make an annotation "mp"(message produced) "mc"(message consumed) in
    the same span (annotation doesn't matter, just that there are a pair).
    • time in transport is the interval between these (noting you can
      also log "ws" "wr" to show wire send attempts)
    • plus is that it doesn't conflate with client/server annotations
    • minus requires you to flush an incomplete span on both sides (not
      all instrumentation allow this)
  • make a separate span for the consumer side
    • time in transport is the interval between the end of the producer
      span and the beginning of the consumer span
    • plus is the diagram more clearly separates the producer from the
      consumer (they are in different spans)
    • this means you propagate the "next span id" to the other side

I can share my kafka code if someone is interested (written in brave)

@codefromthecrypt
Copy link
Member

here's the kafka spike openzipkin/brave#212

@codefromthecrypt
Copy link
Member

let's see if we can nail a design down for zipkin v1 model here: #1243

@codefromthecrypt
Copy link
Member

related issue: multiple parents aka linked traces #1244

@codefromthecrypt codefromthecrypt added enhancement model Modeling of traces labels Oct 23, 2018
@jorgheymans
Copy link
Contributor

We can safely consider async span representation done (and eventually dusted) since Zipkin v2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement model Modeling of traces
Projects
None yet
Development

No branches or pull requests

5 participants