Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce metrics for span duration #142

Closed
marcingrzejszczak opened this issue Feb 4, 2016 · 2 comments
Closed

Introduce metrics for span duration #142

marcingrzejszczak opened this issue Feb 4, 2016 · 2 comments

Comments

@marcingrzejszczak
Copy link
Contributor

Each span can be treated as a potential source of metrics. We could start gathering histograms for each span's duration (or actually aribtrary metrics that the user wants to gather).

@codefromthecrypt
Copy link
Contributor

While this may seem like a quick one, it is actually a pretty thick topic, if our goal is to provide stable data behind trace analysis.

For example, we should know which metrics we want to export, for what purpose, how they are named, and aggregated. The solution would be best documented in terms of how to solve a latency problem.. ex how to navigate to key metrics, what users should expect to do with them. Even if the latter is punting to another tool that consumes an endpoint, we should explain in a README which tools can consume the data, as for example latency distribution data structures aren't universal.

I'd nudge a post on the google group, because at least commercial tracing vendors may have solutions to this we can learn from, not to mention more experienced in-house tracers https://groups.google.com/forum/#!forum/distributed-tracing

In the last distributed tracing workshop, there was a good bit of discussion on span metrics by @bogdandrutu particularly around census tracing.

Most of this was prior art in grpc, such as exported metrics, how they are accumulated, and how they are addressed and named. For example, it appears the path to metrics includes <full quantified rpc service name>/<rpc function name> (and also notes about needing to define charset!)

During the workshop, we noticed some interesting tracing tie-ins. For example, metrics were stored in the same tag set as trace data (and closed within the same scope). Metrics tags were aggregatable dimensions (set with a metrics flag). Users could add custom metrics to a span, as well.

There's also some style practice that is helpful, for example, wrapping fan-outs in a local span allows you a more stable way of analyzing critical path of its caller.

Moreover, if we are exporting span metrics, we likely need to think about and practice.. Tools like google cloud trace offer analysis as simple as viewing average distribution over time, but they include other features, too. When we design this, which stories are we now able to tell, and how do you navigate from a trace view to the metrics system (or visa versa)?

@marcingrzejszczak
Copy link
Contributor Author

A year has passed and I think that my opinion on this has changed. What users do with metrics can be very custom. That's why one can register a custom version of the SpanReporter bean that is executed after the span gets closed. There the user can send the spans to any sort of a metric aggregating tool (Google Cloud Trace, Prometheus, Graphite etc.) . Also all data is already there in the span so all possible analysis can be performed at that time. I'm closing the issue and if we come up with something better then I'll reopen this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants