Zipkin collector performance issues #940

prat0318 · 2016-02-04T00:22:37Z

Zipkin collector when consuming from kafka topic and writing to cassandra takes around 90 ms. per trace, limiting its speed to 11 traces/s. With more and more services adding traces, this becomes a bottleneck. This can be alleviated with increasing the # of kafka partitions and increasing KAFKA_STREAMS # to match the partitions.

But, i still wanted to check where those 90 ms. are getting used. I earlier thought it is the cassandra writes which are the culprits, but surprisingly they are not.

When run as a single thread, the logs show:

**23:58:30.486** [pool-2-thread-1] DEBUG kafka.consumer.PartitionTopicInfo - **reset consume offset of zipkin.traces.iad1:0: fetched offset = 23203837: consumed offset = 23196445 to 23196445**
23:58:30.487 [pool-2-thread-1] DEBUG o.t.z.storage.cassandra.Repository - INSERT INTO traces(trace_id,ts,span_name,span) VALUES (-5168175092676477000,1454452032340000,8586993075614465152_-1083141446_-401043294,0x0a0af800) USING TTL 604800;
23:58:30.487 [pool-2-thread-1] DEBUG o.t.z.storage.cassandra.Repository - INSERT INTO service_name_index(service_name,bucket,ts,trace_id) VALUES (gloteast_1,:bucket,Tue Feb 02 22:27:12 UTC 2016,Wed Oct 16 13:52:03 UTC 163767618) USING TTL 259200;
23:58:30.488 [pool-2-thread-1] DEBUG o.t.z.storage.cassandra.Repository - INSERT INTO service_span_name_index(service_span_name,ts,trace_id) VALUES (lucy.polygloteast_1.post,1454452032340000,-5168175092676477000) USING TTL 259200;
23:58:30.488 [pool-2-thread-1] DEBUG o.t.z.storage.cassandra.Repository - INSERT INTO annotations_index(annotation,bucket,ts,trace_id) VALUES (gloteast_1:cs,:bucket,Tue Feb 02 22:27:12 UTC 2016,-5168175092676477000) USING TTL 259200;
23:58:30.488 [pool-2-thread-1] DEBUG o.t.z.storage.cassandra.Repository - INSERT INTO annotations_index(annotation,bucket,ts,trace_id) VALUES (gloteast_1:cr,:bucket,Tue Feb 02 22:27:12 UTC 2016,-5168175092676477000) USING TTL 259200;
23:58:30.489 [pool-2-thread-1] DEBUG o.t.z.storage.cassandra.Repository - INSERT INTO annotations_index(annotation,bucket,ts,trace_id) VALUES (ygloteast_1:http.uri.client:/ersey/composite_query,:bucket,Tue Feb 02 22:27:12 UTC 2016,-5168175092676477000) USING TTL 259200;
23:58:30.490 [pool-2-thread-1] DEBUG o.t.z.storage.cassandra.Repository - INSERT INTO annotations_index(annotation,bucket,ts,trace_id) VALUES (gloteast_1:http.uri.client,:bucket,Tue Feb 02 22:27:12 UTC 2016,-5168175092676477000) USING TTL 259200;
23:58:30.490 [pool-2-thread-1] DEBUG o.t.z.storage.cassandra.Repository - INSERT INTO span_duration_index(service_name,span_name,bucket,duration,ts,trace_id) VALUES (gloteast_1,post,404014,11000,Tue Feb 02 22:27:12 UTC 2016,-5168175092676477000) USING TTL 259200;
**23:58:30.491** [pool-2-thread-1] DEBUG o.t.z.storage.cassandra.Repository - INSERT INTO span_duration_index(service_name,span_name,bucket,duration,ts,trace_id) VALUES (gloteast_1,,404014,11000,Tue Feb 02 22:27:12 UTC 2016,-5168175092676477000) USING TTL 259200;
**23:58:30.579** [pool-2-thread-1] DEBUG kafka.consumer.PartitionTopicInfo - **reset consume offset of zipkin.traces.iad1:0: fetched offset = 23203837: consumed offset = 23196446 to 23196446*

The interesting thing is the time gap between the last two lines. I am not sure what does collector waits in these 88 seconds. This time gap still remains for 5 workers though it doesn't get multiplied up by 5.

I am not sure if this is a known issue, but wanted to have a discussion if we can debug the reason of that gap.

The text was updated successfully, but these errors were encountered:

codefromthecrypt · 2016-02-12T02:46:50Z

I think one problem is that the kafka receiver is literally writing span-at-a-time, while the span stores accept spans in bulk. I'm presuming the span stores are optimized for bulk, but we haven't benchmarked this.

Note there's another related discussion going on w/ @yurishkuro and @danchia. For example, Yuri had a suggestion about intermediating here. #961 (comment)

prat0318 · 2016-02-12T19:33:35Z

I am down for benchmarking any tweaks that we have in mind like batching cql queries or optimizing span stores for bulk.

I think even adding the functionality of collector receiving a trace-at-a-time can improve the performace by decreasing the total kafa messages. What do you think about that?

eirslett · 2016-02-12T21:45:08Z

It's difficult to implement trace-at-a-time, since different parts/spans of the same trace can potentially be processed by different collector instances. There's no "stickyness".

danchia · 2016-02-12T21:53:36Z

Even within the same collector, because a trace composed of many different spans from different machines I don't think we can avoid multiple messages.

That said, if I'm not wrong the per message overhead for Kafka is pretty low, and there are tunables to help with higher message volumes (batching on the producer size, and fetch size on the consumer side).

prat0318 · 2016-02-12T22:14:10Z

@eirslett i meant collector should be able to handle bundle of different spans of the same trace coming again.
Like currently, my tracer sends spans 1-3 and 4-6 one at a time:


                                     +------+
                                     |      |
                      TraceId:1      |      |
                      Spans: 1-3     |      |               +------------+
      +------------------------------>      +--------------->            |
      |            |----------------->      |---------------> Collector  |
      |  Tracer 1  +----------------->      +--------------->            |
      |            |                 |      |               +-^^^--------+
      +------------+                 |Kafka |                 |-|
                                     |      |                 |-|
                                     |      |                 |-|
                       TraceId:1     |      |                 |-|
      +------------+   Spans: 4-6    |      |                 |-|
      |            +----------------->      +-------------------|
      |  Tracer 2  |----------------->      |-------------------|
      |            +----------------->      +-------------------+
      +------------+                 +------+

Instead, facility to bundle up spans 1-3 and 4-6 together, and let collector separate those out.

                               +------+
                               |      |
                TraceId:1      |      |
                Spans: 1-3     |      |               +------------+
+------------+----------------->      +--------------->            |
|            |                 |      |               | Collector  |
|  Tracer 1  |                 |      |               |            |
|            |                 |      |               +-^----------+
+------------+                 |Kafka |                 |
                               |      |                 |
                               |      |                 |
                 TraceId:1     |      |                 |
+------------+   Spans: 4-6    |      |                 |
|            +-----------------+      +------------------
|  Tracer 2  |                 |      |
|            |                 |      |
+------------+                 +------+

I say this, because we see each span whether with large data or small takes around same ~90ms. if @adriancole is correct and is limited by kafka receiver, this should lead to good perf win for us, as our tracer can bundle up the spans together and send.

danchia · 2016-02-12T22:24:46Z

@prat0318 Do you have your kafka consumer settings handy? A long time ago the defaults in zipkin were not very good, causing offset updates to ZooKeeper very often (which was super expensive).

prat0318 · 2016-02-12T22:37:15Z

@danchia i am using all the defaults of zipkin. I think offset update happens every 10s. by default (if i am not wrong). please let me know if there is a way from outside to see the consumer config.

eirslett · 2016-02-13T00:10:03Z

Aah! I see now, what you mean.
A simple way to solve it, is to queue up spans and send them to kafka in bulk once every 5-10 seconds.
In a more complex implementation it should be possible for spans to bypass that queue if they are tagged with the X-Trace-Debug HTTP flag. (I think that's what it's called) So that if you're a developer who's actively triggering traces for debugging with Zipkin, they will appear in the UI almost instantly.

prat0318 · 2016-02-13T00:37:40Z

@eirslett but the bottleneck is on the collector end when it is reading the spans. It is currently constrained to receive one span message at a time. So, i was thinking making that span message blob more like a list of span messages blob.

codefromthecrypt · 2016-02-13T00:57:32Z

Exactly. SpanStore doesn't require the bundle of spans it receives to be in the same trace. A few tracers do send bundles at a time, subject to either span count or bundle size. Right now, the Kafka receiver literally reads only one span from a message. It doesn't matter if the span is 200bytes or 2megs. I think the first thing we can try is just allowing the receiver to accept a bundle.. Not sure if we try to read a list and only one is present.. If that works or not in thrift. Once the receiver can accept N spans (aka a bundle) then instrumentation can choose how much to send per message. Make sense?

prat0318 · 2016-02-13T01:24:28Z

SpanStore doesn't require the bundle of spans it receives to be in the same trace

That means, it should not be hard to make changes from collector end if we want to receive a bundle of spans. Right?

A few tracers do send bundles at a time, subject to either span count or bundle size.

confused how they can do that if they use the same collector? Does collector support this yet?

Not sure if we try to read a list and only one is present..

I think that is important point to keep it Backwards Compat. Collector should be able to accept both a list of spans or a single span correctly.

overall, i am very excited to try out the idea of span bundle.

codefromthecrypt · 2016-02-15T06:54:16Z

I opened #979 about multiple spans. Still, there's a question about 88milliseconds going unaccounted for Currently, the collector itself isn't instrumented, so it is hard to get what could be the problem. Is it possible for you to run the collector against a different span store? If, for example, you can run using collector-dev.scala, you could possibly narrow any non-kafka-related delays.

codefromthecrypt · 2016-08-23T01:10:20Z

we should close this issue with a pull request updating zipkin/zipkin-collector/kafka/README.md with notes on how to achieve best performance. there are some notes here: https://docs.google.com/document/d/1Px44fjZ37gr05lV7UFo8AfrWZCcJHCuv58290XCbDaw/edit#bookmark=id.eeozlmh0fxr

jorgheymans · 2020-08-01T20:30:19Z

we should close this issue with a pull request updating zipkin/zipkin-collector/kafka/README.md with notes on how to achieve best performance. there are some notes here: https://docs.google.com/document/d/1Px44fjZ37gr05lV7UFo8AfrWZCcJHCuv58290XCbDaw/edit#bookmark=id.eeozlmh0fxr

#3152 addressed this. If there are any other expert-level tweaks sites had success with we'ld be glad to hear them !

codefromthecrypt added docs kafka labels Oct 23, 2018

jorgheymans closed this as completed Aug 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zipkin collector performance issues #940

Zipkin collector performance issues #940

prat0318 commented Feb 4, 2016

codefromthecrypt commented Feb 12, 2016

prat0318 commented Feb 12, 2016

eirslett commented Feb 12, 2016

danchia commented Feb 12, 2016

prat0318 commented Feb 12, 2016

danchia commented Feb 12, 2016

prat0318 commented Feb 12, 2016

eirslett commented Feb 13, 2016

prat0318 commented Feb 13, 2016

codefromthecrypt commented Feb 13, 2016 via email

prat0318 commented Feb 13, 2016

codefromthecrypt commented Feb 15, 2016 via email

codefromthecrypt commented Aug 23, 2016

jorgheymans commented Aug 1, 2020

Zipkin collector performance issues #940

Zipkin collector performance issues #940

Comments

prat0318 commented Feb 4, 2016

codefromthecrypt commented Feb 12, 2016

prat0318 commented Feb 12, 2016

eirslett commented Feb 12, 2016

danchia commented Feb 12, 2016

prat0318 commented Feb 12, 2016

danchia commented Feb 12, 2016

prat0318 commented Feb 12, 2016

eirslett commented Feb 13, 2016

prat0318 commented Feb 13, 2016

codefromthecrypt commented Feb 13, 2016 via email

prat0318 commented Feb 13, 2016

codefromthecrypt commented Feb 15, 2016 via email

codefromthecrypt commented Aug 23, 2016

jorgheymans commented Aug 1, 2020