Representing an asynchronous span in Zipkin #1189

tennenbaum · 2016-07-17T15:19:14Z

For instance, if we want a span to represent the latency between producing a message to a queue and consuming it from the queue (e.g. if Kafka is the queue). In this case the producer call will finish prior to the start of the consumer call. We would like to represent the duration between producing and consuming.

We could use the core RPC annotations in this case to the represent the producer call (cs at the start, cr at the end) and consumer call (sr at the start, ss at the end). However, as the client annotations will both occur before the server annotations, looking at the Zipkin code (in particular zipkin.internal.CorrectForClockSkew#apply) this will cause Zipkin to detect clock skew where there is none.

Is there a better way of representing this type of call using the Zipkin data model?

eirslett · 2016-07-17T15:44:03Z

The Zipkin data model doesn't support async spans like tracing kafka messages. It would need to be redesigned. (There's some work going on about Zipkin data model v2 I think)

eirslett · 2016-07-17T15:45:22Z

One thing you could do, to represent it, is to start a new trace, with a new trace id, on the consumer side, and then add a binary annotation with an ID that correlates the two traces. But we have no UI support or special tooling support for that yet.

tennenbaum · 2016-07-17T15:58:42Z

Thanks, I will look at the v2 discussion. I would like to include the asynchronous span in the same trace. An alternative approach I can see (for v1 code) is to submit non-core (custom) annotations when consuming and producing (using the same spanId). It appears that Zipkin would not attempt to adjust for clock skew where the core annotations are not present. It also seems that it would derive the duration of the span from the difference between the newest and oldest timestamps of the custom annotations. I am not sure whether this will play nice with the rest of the tooling though (very new to this code base). Any thoughts on that?

Note: Implied in the above is that we are sending the in-band tracing information (e.g. spanId) as metadata in the queued message.

codefromthecrypt · 2016-07-18T00:59:21Z

To address the clock skew thing, pretend the producer action is a client action and the consumer action is a server one. these represent the boundaries of your two services, with kafka etc in the middle. The problem is that there is kafka in the middle :) That said, the clock skew will be shifted anyway as regardless of that the consumer shouldn't receive a message before it is produced. Another (more dramatic approach) to clock skew could be to keep track of things on the instrumentation side. For example, if you know the time of the kafka server, you can shift to that before reporting. This is different than NTP as you'd just shift for the purposes of sending trace data as opposed to the whole VM/instance.

codefromthecrypt · 2016-07-21T00:51:22Z

@AndrewWang996 had a question about this in Brave. Basically how do you deal with a one-way system. He was concerned about half-open spans ("cs" and "sr" only), although I don't know what the specific issue with that was. Guessing dependencies view? or maybe lack of duration?

Zipkin doesn't care about these annotations except for a few scenarios
- clock skew correction (ex receiver happens before send)
- duration calculation (only relevant when Span.timestamp and Span.duration aren't set)
- dependencies view (to show the link between services

I mentioned similar things to here..

You can treat the act of sending the message as "cs", "cr" and the act of receiving as "sr", "ss". Ex. close the span after you've sent the message (ignoring that there was no response received).
You can close the span with something else (ex Brave's LocalTracer), which adds span duration without using RPC annotations. You should add "ca" (producer) and "sa" (consumer) on the span (to ensure the dependencies graph can show things).

Things we could do here is investigate clock skew when no RPC spans exist in a trace. That could be bumped out as a separate issue, but I think it would only work if "ca" and "sa" are logged.

AndrewWang996 · 2016-07-25T21:40:37Z

Measuring the latency between producing to and consuming from Kafka queues is precisely what I'm trying to do, although unlike @tennenbaum, I only considered adding the "cs" annotation upon produce and "sr" upon consume without worrying about how long it takes the message to be produced and consumed.

My problem is that I was attempting to use the ClientRequestInterceptor + ServerRequestInterceptor mentioned in Brave's 3.0.0 api, but it seems that this only submits the span if either ["cs", "cr"] or ["sr", "ss"] are handled in the Interceptor.handle(Adapter) methods. I only needed to handle "cs" and "sr" with the ClientRequestInterceptor and ServerRequestInterceptor, but in order for Brave to submit the spans, I needed to make a dummy adapter and submit "ss" as well, not to close the span, but just so that Brave knew to submit it. This is suboptimal. I'm not even worrying about dependencies or duration.

codefromthecrypt · 2016-07-25T23:08:07Z

If your goal is to measure latency of async spans, I think we can find something to work. If your goal is to make a span with only "cs" and "sr" in it, I'm not interested in helping as it will produce bugs elsewhere we'd have to clean up. The core RPC annotations are made to be used together, so you'd be better off being more flexible about this point.

codefromthecrypt · 2016-07-26T21:42:18Z

ps spent all my battery on this topic on the flight :)

so I think there are a few ways to skin this cat. I tried a couple
with Kafka (stuffing SpanId into the message key, and using callbacks
to close the producer span)

represent sending to the broker as a cs/cr and receiving from the
broker as sr/ss (in the same span)
- time in transport is the interval between cr and sr (noting you
  can also log "ws" "wr" to show wire send attempts)
- plus is that you can do this with all instrumentation
- minus is the "ss" and "cr" parts are forged (ex you are really
  capturing the overhead of send and receipt)
make an annotation "mp"(message produced) "mc"(message consumed) in
the same span (annotation doesn't matter, just that there are a pair).
- time in transport is the interval between these (noting you can
  also log "ws" "wr" to show wire send attempts)
- plus is that it doesn't conflate with client/server annotations
- minus requires you to flush an incomplete span on both sides (not
  all instrumentation allow this)
make a separate span for the consumer side
- time in transport is the interval between the end of the producer
  span and the beginning of the consumer span
- plus is the diagram more clearly separates the producer from the
  consumer (they are in different spans)
- this means you propagate the "next span id" to the other side

I can share my kafka code if someone is interested (written in brave)

codefromthecrypt · 2016-08-17T01:35:22Z

here's the kafka spike openzipkin/brave#212

codefromthecrypt · 2016-08-17T01:51:27Z

let's see if we can nail a design down for zipkin v1 model here: #1243

codefromthecrypt · 2016-08-17T05:42:50Z

related issue: multiple parents aka linked traces #1244

jorgheymans · 2020-05-20T21:07:01Z

We can safely consider async span representation done (and eventually dusted) since Zipkin v2.

codefromthecrypt mentioned this issue Jul 21, 2016

does brave support async call? openzipkin/brave#164

Closed

codefromthecrypt mentioned this issue Jul 27, 2016

Add insight about span recording and propagation openzipkin/openzipkin.github.io#40

Open

codefromthecrypt mentioned this issue Aug 17, 2016

Support async spans #1243

Closed

codefromthecrypt added enhancement model Modeling of traces labels Oct 23, 2018

jorgheymans closed this as completed May 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representing an asynchronous span in Zipkin #1189

Representing an asynchronous span in Zipkin #1189

tennenbaum commented Jul 17, 2016

eirslett commented Jul 17, 2016

eirslett commented Jul 17, 2016

tennenbaum commented Jul 17, 2016 •

edited

Loading

codefromthecrypt commented Jul 18, 2016 via email

codefromthecrypt commented Jul 21, 2016

AndrewWang996 commented Jul 25, 2016

codefromthecrypt commented Jul 25, 2016 via email

codefromthecrypt commented Jul 26, 2016 •

edited

Loading

codefromthecrypt commented Aug 17, 2016

codefromthecrypt commented Aug 17, 2016

codefromthecrypt commented Aug 17, 2016

jorgheymans commented May 20, 2020

Representing an asynchronous span in Zipkin #1189

Representing an asynchronous span in Zipkin #1189

Comments

tennenbaum commented Jul 17, 2016

eirslett commented Jul 17, 2016

eirslett commented Jul 17, 2016

tennenbaum commented Jul 17, 2016 • edited Loading

codefromthecrypt commented Jul 18, 2016 via email

codefromthecrypt commented Jul 21, 2016

AndrewWang996 commented Jul 25, 2016

codefromthecrypt commented Jul 25, 2016 via email

codefromthecrypt commented Jul 26, 2016 • edited Loading

codefromthecrypt commented Aug 17, 2016

codefromthecrypt commented Aug 17, 2016

codefromthecrypt commented Aug 17, 2016

jorgheymans commented May 20, 2020

tennenbaum commented Jul 17, 2016 •

edited

Loading

codefromthecrypt commented Jul 26, 2016 •

edited

Loading