-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple parents aka Linked traces #1244
Comments
Do you mean, if a traced operation need a queue, it should be considered as two traces? and link to each other by trace-id? Another option is considering a queue is something like rpc, generate a client-span and a server-span. What's you prefer? |
here's an example model of links https://github.com/googleapis/googleapis/blob/master/google/tracing/trace.proto#L163 |
Here is a convenience paste of notes on linked traces from a recent workshop. Notably they aren't used primarily for async (discussion about async is here) Linked traces from Tracing Workshop Notes Most common use case is how to represent batching (like flushing to disk). Could this be multiple parents? In this case the batch is an independent operation that precedes what it is batching. This consolidating relationship exists, but is not strictly parent/child. The original purpose of links is similar in concept to the reference concept in OpenTracing. Look for this in BigTable client (soon :)) Later use cases of links included how to handle multi-tenancy. For example, how do you handle a request that crosses multiple organizations (Ex tenant and the cloud itself). If you render a normal parent/child, a malicious client could send a constant trace ID and mess up all the traces in google’s trace repository. Another abuse case is injecting traffic and observing the exit points, allowing them to understand some internals of the architecture. Linking can obscure the “internal trace” while still allowing causality. In implementation, a trace is restarted entering one network, but linked back to the requestors ID. The way this usually works is that you can see that there’s a link to another trace, but not necessarily see the data itself. Another use case is long queries. How do you deal with pipelines that could take days to complete? Using links If you encode timestamp into a trace ID, you cannot determine at runtime if there’s a long trace in progress. Sergey mentioning some customers are quite large and may be tempted to use links any time they exit their network. Bogdan replying that overuse is possible, but not sure a solution. Documentation needs to be good with advice about when to use links and why, perhaps advising links as an edge case. Sergey ask about sampling and links, particularly sampling based on trace IDs. Bogdan says by propagating the sampling decision you don’t need to look at the value of the trace ID afterwards. By default accept the decision from the caller, with some thought about abuse in the future (maybe throttling). S: How does batching apply to sampling? B: if one trace in the batch is sampled, the batch is sampled. Some linking policy have to do with trust level, which is likely higher for paying users than anonymous ones. (Trace, span, link, attributes) is an example format The reason for a trace ID is to identify all spans. Now, you have 10 requests coming with 10 different trace IDs, you can’t make them one. One of the problems with tracing (for service owners) is how to know when a trace is complete (and ready for analysis). Use of links can help break up longer work into units that can be analyzed, links being the part that can combine “finished” work. One guiding policy is to not put into the client what can be done on the server. In google, they defer to the latter to reduce critical path overhead. One approach to visualizing a batch is to chop the heads off the traces linked to the batch, so they appear as children of the batch (ignoring span in each trace preceding the batch) |
GDPR is a thing now and the right to be forgotten requires distributed long transactions across components. At Typeform we have the following use case.
In terms of observing traces there are two options we explored:
The convenience of having linked traces is not only because we can group traces (that we could do with a tag) but also it allows use to have more meaningful traces, grouping all the effects of a single operations when it comes to async removes the focus from the child effects by them self. So, could we consider this in the roadmap? |
@adriancole there are a lot of threads about these async span concepts. Is there one you’re tracking in particular or where we can read the lastest on the plans in the space? |
This one focuses on limitations of normal parent/child relationship (as complicated by multiple origins or coalescing) less so on async programming. |
Apologies, my question was very poorly structured :-) My interest is mostly on attribution of work concerns involving fork/join and other such constructs like very delayed messages (do we just use 1 trace for that or relate another) that aren't well supported by the common tree based modeling of traces today. I think I am in the right issue, but used the wrong words. |
Hi,
So the traceId of the generated message will be the traceId of the last message aggregated. But it should have the parentIds of all its parents (The id's of the messages this new message is result of the aggregation). This way we can trace where it comes from. Does it make sense? |
This is a story about traces that relate to eachother. There are a number of scenarios, mostly messaging or scheduling in nature, where the next task is loosely related to its predecessor.
Here are some examples where things are related
During the tracing workshop, we discussed the idea of "linked traces" In this, we could add one or more links to other trace ids which either caused the current trace, or were the cause of future traces.
You can imagine a scenario whereby creating a new root span allows you to add a trace id of its cause.
Ex 1. trace1 knows it is sending off a message for offline processing.
It is not on the critical path of the caller. The tracer generates a new root span for trace2 and adds a link to trace1. trace2 is propagated in message metadata, and the link of trace2 to trace1 is sent out-of-band to zipkin. in this case trace2 hasn't started yet.
Ex 2. trace3 is an operation that needs to read a message from a queue.
The read of a message from the queue closes trace2, but the tracer's context notices trace3 is the current trace. It adds a link of trace2 to itself (trace3) and proceeds.
To support this, Span needs a links array, or needs to create a special binary annotation of name "link" and value the trace id. The storage layer needs to be able to search for traces by linked trace id. Minimally, the UI needs to present these linked traces in the span detail view, and render hrefs accordingly.
The text was updated successfully, but these errors were encountered: