Introduce metrics for span duration #142

marcingrzejszczak · 2016-02-04T09:16:20Z

Each span can be treated as a potential source of metrics. We could start gathering histograms for each span's duration (or actually aribtrary metrics that the user wants to gather).

codefromthecrypt · 2016-02-06T04:05:00Z

While this may seem like a quick one, it is actually a pretty thick topic, if our goal is to provide stable data behind trace analysis.

For example, we should know which metrics we want to export, for what purpose, how they are named, and aggregated. The solution would be best documented in terms of how to solve a latency problem.. ex how to navigate to key metrics, what users should expect to do with them. Even if the latter is punting to another tool that consumes an endpoint, we should explain in a README which tools can consume the data, as for example latency distribution data structures aren't universal.

I'd nudge a post on the google group, because at least commercial tracing vendors may have solutions to this we can learn from, not to mention more experienced in-house tracers https://groups.google.com/forum/#!forum/distributed-tracing

In the last distributed tracing workshop, there was a good bit of discussion on span metrics by @bogdandrutu particularly around census tracing.

Most of this was prior art in grpc, such as exported metrics, how they are accumulated, and how they are addressed and named. For example, it appears the path to metrics includes <full quantified rpc service name>/<rpc function name> (and also notes about needing to define charset!)

During the workshop, we noticed some interesting tracing tie-ins. For example, metrics were stored in the same tag set as trace data (and closed within the same scope). Metrics tags were aggregatable dimensions (set with a metrics flag). Users could add custom metrics to a span, as well.

There's also some style practice that is helpful, for example, wrapping fan-outs in a local span allows you a more stable way of analyzing critical path of its caller.

Moreover, if we are exporting span metrics, we likely need to think about and practice.. Tools like google cloud trace offer analysis as simple as viewing average distribution over time, but they include other features, too. When we design this, which stories are we now able to tell, and how do you navigate from a trace view to the metrics system (or visa versa)?

marcingrzejszczak · 2017-02-23T10:25:28Z

A year has passed and I think that my opinion on this has changed. What users do with metrics can be very custom. That's why one can register a custom version of the SpanReporter bean that is executed after the span gets closed. There the user can send the spans to any sort of a metric aggregating tool (Google Cloud Trace, Prometheus, Graphite etc.) . Also all data is already there in the span so all possible analysis can be performed at that time. I'm closing the issue and if we come up with something better then I'll reopen this.

marcingrzejszczak added the enhancement label Feb 4, 2016

marcingrzejszczak added this to the 1.0.0 milestone Feb 4, 2016

dsyer modified the milestones: 1.0.0.M5, 1.0.0.RC1 Feb 5, 2016

marcingrzejszczak added the in progress label Feb 5, 2016

marcingrzejszczak added a commit that referenced this issue Feb 5, 2016

[#142] Initial implementation of span duration histogram

a8e8080

marcingrzejszczak self-assigned this Feb 5, 2016

marcingrzejszczak added help wanted backlog icebox and removed in progress backlog labels Feb 9, 2016

dsyer modified the milestone: 1.0.0.RC1 Feb 11, 2016

marcingrzejszczak closed this as completed Feb 23, 2017

marcingrzejszczak removed the icebox label Feb 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce metrics for span duration #142

Introduce metrics for span duration #142

marcingrzejszczak commented Feb 4, 2016

codefromthecrypt commented Feb 6, 2016

marcingrzejszczak commented Feb 23, 2017

Introduce metrics for span duration #142

Introduce metrics for span duration #142

Comments

marcingrzejszczak commented Feb 4, 2016

codefromthecrypt commented Feb 6, 2016

marcingrzejszczak commented Feb 23, 2017