Unifying Jaeger clients design #1269

yurishkuro · 2018-12-28T23:36:06Z

There are many discrepancies in the Jaeger clients today about how their internal components are implemented. It would be good to come up with an abstract reference design and gradually ensure all clients follow the pattern. Specifically, the most divergence is in the implementation of span reporting. I think the following is the type system that we want to use going forward:

Reporter: an abstract interface for doing something with finished spans. Examples may include:
- logging span ID,
- no-op
- keeping spans in memory in a ring buffer (for inspection in z-pages equivalent)
- sending spans out of process (to agent or collector)
RemoteReporter implements Reporter: responsible for taking the finished spans off the hot path of the request, e.g. by putting them in a queue consumed by a background thread, or some other form of async processing.
- Requires a Sender to actually send the spans out of process.
- Guarantees that the Sender is invoked on a single thread
- Calls sender.append() on new span in the queue
- Calls sender.flush() when reporter.close() is called in order to fully drain the queue
- Emits metrics on the number of spans sent out (or failed)
Sender: an abstract interface with two methods, append and flush. Implementations may accept Encoder and Transport as dependencies.
- append() may buffer new spans for sending out later in a batch, or may flush them immediately (e.g. if its internal buffer is full or reached the max packet size). Returns the number of spans flushed, if any.
- flush() immediately sends all spans from the internal buffer, and returns the number of spans successfully send.
Transport is used by the Sender to send a batch of spans to the target destination, e.g. the agent or the collector. API of the Transport may be language-specific. It accepts the batch already encoded to a series of bytes, to allow for decoupling of the encoding format (Thrift, JSON, proto) from the communication channel (HTTP, UDP, Kafka).
Encoder converts a batch of spans into a sequence of bytes.
- TBD: UDP transport requires that the final size of the byte array representing the batch of spans does not exceed max packet size supported by the agent (default: 65000 bytes). If Sender is the one responsible for accumulating spans in a batch, then there needs to be an interface between Sender and Encoder to incrementally calculate the size of each encoded span or the total size of the batch. More fine-grained access to the encoder allows for more efficient implementation.

Sample Sender interface:

// Sender abstracts the method of sending spans out of process.
// Implementations are NOT required to be thread-safe; the RemoteReporter
// is expected to only call methods on the Sender from the same go-routine.
type Sender interface {
	// Append converts the span to the wire representation and adds it
	// to sender's internal buffer.  If the buffer exceeds its designated
	// size, the transport should call Flush() and return the number of spans
	// flushed, otherwise return 0. If error is returned, the returned number
	// of spans is treated as failed span, and reported to metrics accordingly.
	Append(span *Span) (int, error)

	// Flush submits the internal buffer to the remote server. It returns the
	// number of spans flushed. If error is returned, the returned number of
	// spans is treated as failed spans, and reported to metrics accordingly.
	Flush() (int, error)
}

The text was updated successfully, but these errors were encountered:

rmfitzpatrick · 2019-01-22T17:56:18Z

What about error conditions in Transport and Sender? Should a multiple return value be standardized for marshaling up errors like in go's Flush() and Append()? This wouldn't be idiomatic in python or java, where exceptions should be clearly thrown, but the metrics factories require a fair amount of awareness in the RemoteReporter that can't be easily obtained otherwise (e.g. keeping a counter of appended spans).

yurishkuro · 2019-02-18T20:09:23Z

I think returning multiple values is fine in Python, I've seen it done. In Java, we could throw a custom exception, or use an observer, or ... open to suggestions

Falco20019 · 2020-04-16T10:42:44Z

What's the status here? I stumbled over it again the last couple days when refactoring some stuff. In C#, we already have senders, but the Transport and Encoder levels are included in it. That lead to some duplication between gRPC's GrpcSender and Thrift's UdpSender (especially the max size and append logic). So I would be looking into it soon.

Encoder could be a bit problematic, as Thrift and gRPC both use specific implementations that include the encoding. So maybe it's easier to do this by another approach, similar to what Thrift does (where Transport is used by Protocol).

I would like an approach, where the data would be encapsulated in a way that allows us to write a more generic Sender.

The approach that I will look into next is having an Encoder that can generate Process, Batch and Span defined and using a common interface with getSize(). This way, I'm not as generic as having an Encoder that only knows about byte[], but am still able to create Transport instances for Thrift and gRPC that can be used by the same Sender.

Falco20019 · 2020-04-16T19:19:03Z

@yurishkuro I played around with this idea and got something working over at https://github.com/jaegertracing/jaeger-client-csharp/tree/Falco20019/encoder-transport

I ended up with Senders, Encoders and Transports. Senders are encapsulating the logic what happens at append and flush. The Encoder knows the Transport and only has a method to get an encode span. The Transport has a method to get the size of an object. The GRPC and Thrift Encoders are based on the SizedBatch encoder and implement an encoder that knows GrpcTransport or ThriftTransport (base class / interface for all transports working with this encoder). The encoder also is responsible for encoding the data from Span to IEncodedData which contains the size.

The Transport itself is responsible for the connection, for finding out the real data size (delegated from the encoder and enhanced if needed) and the real writing to the connection. I sadly found no more generic way that still allows us to use Thrift and GRPC with generated code.

What do you think about this approach? I like it to the extend that it abstracts the logic for the size stuff and append-flush-cycles away. But I don't like that I was not able to find a more generic and fitting way.

yurishkuro · 2020-04-17T19:57:33Z

I think the concept of Encoder works ok when dealing with encoding-agnostic transports like UDP or HTTP, but I am not sure how easy it is to use byte[] payloads (or any other fixed type) with gRPC, which usually uses strongly-typed auto-generated clients.

yurishkuro · 2022-01-13T15:40:13Z

Per #3362, we're sunsetting Jaeger clients.

yurishkuro added meta-issue An tracking issue that requires work in other repos client-libs labels Dec 28, 2018

yurishkuro mentioned this issue Dec 28, 2018

Initial Sender implementation jaegertracing/jaeger-client-python#208

Closed

Falco20019 mentioned this issue Apr 16, 2020

Shift from ISender to ITransports and IEncoders jaegertracing/jaeger-client-csharp#77

Closed

yurishkuro closed this as completed Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unifying Jaeger clients design #1269

Unifying Jaeger clients design #1269

yurishkuro commented Dec 28, 2018

rmfitzpatrick commented Jan 22, 2019 •

edited

Loading

yurishkuro commented Feb 18, 2019

Falco20019 commented Apr 16, 2020 •

edited

Loading

Falco20019 commented Apr 16, 2020

yurishkuro commented Apr 17, 2020

yurishkuro commented Jan 13, 2022

Unifying Jaeger clients design #1269

Unifying Jaeger clients design #1269

Comments

yurishkuro commented Dec 28, 2018

rmfitzpatrick commented Jan 22, 2019 • edited Loading

yurishkuro commented Feb 18, 2019

Falco20019 commented Apr 16, 2020 • edited Loading

Falco20019 commented Apr 16, 2020

yurishkuro commented Apr 17, 2020

yurishkuro commented Jan 13, 2022

rmfitzpatrick commented Jan 22, 2019 •

edited

Loading

Falco20019 commented Apr 16, 2020 •

edited

Loading