-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce metrics for span duration #142
Comments
While this may seem like a quick one, it is actually a pretty thick topic, if our goal is to provide stable data behind trace analysis. For example, we should know which metrics we want to export, for what purpose, how they are named, and aggregated. The solution would be best documented in terms of how to solve a latency problem.. ex how to navigate to key metrics, what users should expect to do with them. Even if the latter is punting to another tool that consumes an endpoint, we should explain in a README which tools can consume the data, as for example latency distribution data structures aren't universal. I'd nudge a post on the google group, because at least commercial tracing vendors may have solutions to this we can learn from, not to mention more experienced in-house tracers https://groups.google.com/forum/#!forum/distributed-tracing In the last distributed tracing workshop, there was a good bit of discussion on span metrics by @bogdandrutu particularly around census tracing. Most of this was prior art in grpc, such as exported metrics, how they are accumulated, and how they are addressed and named. For example, it appears the path to metrics includes During the workshop, we noticed some interesting tracing tie-ins. For example, metrics were stored in the same tag set as trace data (and closed within the same scope). Metrics tags were aggregatable dimensions (set with a metrics flag). Users could add custom metrics to a span, as well. There's also some style practice that is helpful, for example, wrapping fan-outs in a local span allows you a more stable way of analyzing critical path of its caller. Moreover, if we are exporting span metrics, we likely need to think about and practice.. Tools like google cloud trace offer analysis as simple as viewing average distribution over time, but they include other features, too. When we design this, which stories are we now able to tell, and how do you navigate from a trace view to the metrics system (or visa versa)? |
A year has passed and I think that my opinion on this has changed. What users do with metrics can be very custom. That's why one can register a custom version of the |
Each span can be treated as a potential source of metrics. We could start gathering histograms for each span's duration (or actually aribtrary metrics that the user wants to gather).
The text was updated successfully, but these errors were encountered: