Skip to content
This repository has been archived by the owner on Dec 4, 2023. It is now read-only.

Add support for distributed tracing #1212

Closed
calohmn opened this issue May 8, 2018 · 12 comments
Closed

Add support for distributed tracing #1212

calohmn opened this issue May 8, 2018 · 12 comments

Comments

@calohmn
Copy link
Contributor

calohmn commented May 8, 2018

In order to get an overview of the lifetime of a message going through EnMasse components, support for distributed Tracing should be added.
This applies to the router and broker components.

The OpenTracing standard [1] defines an API for distributed Tracing. Using this standard could facilitate using different (OpenTracing-compatible) Tracing systems, configurable via the address space.

Sending traces from router/broker to the Tracing System could be done via an intermediary component. This component could provide more control over where to direct the tracing data - i.e. direct trace data related to one specific address space to a specific Tracing system, even if router/broker are shared between address spaces.

However, to begin with (and until sharing of router/broker is implemented), this intermediary component could be omitted and traces sent directly to the tracing system from the router and broker.
As the OpenTracing implementation for doing this, Jaeger [2] looks like a good solution.
Also as there are clients for both Java (for the Artemis broker) and C++ (for the Qpid dispatch router).

To be done for this:

  1. Agree on tracing info stored in AMQP messages
    This is to propagate the SpanContext between components. Needed would be trace and span-id (see Jaeger propagation format) and possibly the EnMasse address as baggage item. I guess this would go into AMQP Message Annotations.
  2. Define format for configuring tracing provider on address space
    (See this config.yml for possible options)
  3. Instrument router and broker to use OpenTracing API
  4. Create intermediary that forwards trace data to configured trace provider (possibly done later)

[1] http://opentracing.io/
[2] https://jaegertracing.io/

@sophokles73
Copy link

sophokles73 commented Jun 6, 2018

This is to propagate the SpanContext between components. Needed would be trace and span-id (see Jaeger propagation format) and possibly the EnMasse address as baggage item. I guess this would go into AMQP Message Annotations.

I would actually like to propose to use the message's Delivery Annotations which are intended to be used to convey information from sending peer to receiving peer and are not required to be forwarded across all steps along the message flow. They seem to be more appropriate for the task at hand FMPOV.

BTW in Hono [1] we are currently working on OpenTracing instrumentation. We are currently also planning to use Jaeger as the tracing implementation.

[1] https://www.eclipse.org/hono

@calohmn
Copy link
Contributor Author

calohmn commented Jun 7, 2018

@sophokles73 FMPOV, these annotations should actually be propagated across every delivery step. This would be to support a scenario, where one intermediate step in the chain has no explicit support for these annotations, but later ones have. Think of Hono being connected to a (non-tracing aware) router with a connection to a tracing-aware receiver. In such a scenario, tracing annotations shouldn't get lost at the router step.

In that sense, I think the message annotations would fit better here:

The message-annotations section is used for properties of the message which are aimed at the infrastructure and SHOULD be propagated across every delivery step. Message annotations convey information about the message. Intermediaries MUST propagate the annotations unless the annotations are explicitly augmented or modified (e.g., by the use of the modified outcome).

@calohmn
Copy link
Contributor Author

calohmn commented Jun 12, 2018

@rgodfrey do you have an opinion on this (which kind of AMQP annotations to use, delivery or message)?

@rgodfrey
Copy link
Contributor

@calohmn @sophokles73 So my initial thought was that these would be these are delivery annotations because we might want to give the infrastructure (e.g. EnMasse) the choice of not forwarding the annotation to the consuming client.

On the other hand there are scenarios such as Carsten describes where we would want an intermediary to blindly forward the annotation even while it itself does not understand it. From a modelling point of view, if we think of the requirements of "tracing" a message then that does point to it being associated with the message rather than an individual delivery.

So, I'm not particularly religious about it, but my inclination now is that message annotations might be a better location, and that we may decide to "bend" the spec a little by allowing intermediaries to omit or modify the annotation on forwarding the message in some circumstances.

@sophokles73
Copy link

On the other hand there are scenarios such as Carsten describes where we would want an intermediary to blindly forward the annotation even while it itself does not understand it. From a modelling point of view, if we think of the requirements of "tracing" a message then that does point to it being associated with the message rather than an individual delivery.

The opposite is also true: I would not want an intermediary to leak the internal information passed from peer to peer in order to implement tracing to external application level message consumers. If a container does not know about tracing at all, then the tracing ends at this component. BTW this is the same as for HTTP based services that itself invokes other services. You wouldn't assume that the tracing context that has been passed into the service in the HTTP headers would be copied to the outbound requests' headers.

I also do not think that we are tracing the message itself. Instead, we are actually trying to trace the processing of the message, which includes the forwarding of the raw message to the consumer but also includes invocations of the device registry in order to assert the device's registration status and retrieving credentials in order to authenticate the device during connection establishment.

All in all, I still believe that an AMQP container should either understand what tracing is (and then extract the context from the delivery annotations) or otherwise ignore the information and prevent it from being forwarded downstream (which seems to be the intended behavior for message annotations). So, I am still in favor of using delivery annotations. This shouldn't be an issue for the Dispatch Router because you intend to implement explicit support for OpenTracing anyway ...

@calohmn
Copy link
Contributor Author

calohmn commented Jul 24, 2018

@sophokles73 I see your points here. And as the central component here, the dispatch router, should get Tracing support anyway, the scenario of having non-tracing aware intermediate components in the chain would be rather rare, I guess. So, delivery annotations would be fine with me.

@rgodfrey @grs Speaking of the dispatch router: In the opentracing repo, there is an C++ example, loading the tracer library dynamically:
https://github.com/opentracing/opentracing-cpp/blob/master/example/dynamic_load/dynamic_load-example.cpp
See also opentracing/opentracing-cpp#45. That approach is supported by Jaeger now as well.
Do you think that could be a starting point for the Qpid dispatch router implementation?

@grs
Copy link
Member

grs commented Jul 24, 2018

@calohmn yes, I think so, at least something like that. Keeping any dependencies optional I think will help.

@calohmn
Copy link
Contributor Author

calohmn commented Jul 25, 2018

@grs ok, I've created DISPATCH-1088 for that.

@calohmn
Copy link
Contributor Author

calohmn commented Aug 17, 2018

Issue for support in Artemis: ARTEMIS-2028

@calohmn
Copy link
Contributor Author

calohmn commented Oct 9, 2018

Concerning support in Artemis: discussion on the Artemis mailing list a while back has had the result, that a solution as a broker plugin is seen as the preferred way to implement this (possibly contributed to "OpenTracing-contrib").
Missing for that still is a way to set outgoing delivery annotations via a broker plugin - PR for that is #2256. (By default, Artemis removes all delivery annotations, hence also tracing data, when sending to consumers.)

@calohmn
Copy link
Contributor Author

calohmn commented Jan 29, 2019

The above mentioned Artemis PR #2256 has been merged.

@lulf
Copy link
Member

lulf commented Jan 29, 2019

@calohmn I see this is in the new 2.6.4. We should update to that once 0.26.0 is out (tomorrow).

@lulf lulf closed this as completed May 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants