-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inbound transport as label to collector metrics #1446
Add inbound transport as label to collector metrics #1446
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a separate counter. We already have metrics partitioned by the span format
jaeger_collector_traces_received_total{debug="false",format="jaeger",svc="other-services"} 0
jaeger_collector_traces_received_total{debug="false",format="unknown",svc="other-services"} 0
jaeger_collector_traces_received_total{debug="false",format="zipkin",svc="other-services"} 0
jaeger_collector_traces_received_total{debug="true",format="jaeger",svc="other-services"} 0
jaeger_collector_traces_received_total{debug="true",format="unknown",svc="other-services"} 0
jaeger_collector_traces_received_total{debug="true",format="zipkin",svc="other-services"} 0
I would simply add transport
as another dimension here.
Also this build error doesn't seem to happen when I build locally, any idea?
|
@yurishkuro It already seems too many dimensions here. Do we really want to add on top of that? Eventually we are only interested in knowing where spans are coming from. |
Yes, we want another dimension there. Otherwise, we'll end up with two metrics reporting the same numbers. |
Yes, add as another dimension. It's still the same metric, span count, no matter how we slice and dice it. Because format and transport are static dimensions, we need to avoid string concatenations and mutexes. |
hmmm. The thing I notice is: Also it seems both tchannel and http are invoking e.g
and just get it later in span_handler.go. Please advise. |
I just took a look at this and feel that we should add metrics directly to the outer level handlers. Specifically, adding metrics to I don't understand why Right now, @yurishkuro Are there any arguments of keeping these metrics in |
@vprithvi I think the main argument is the explosion of different metrics names instead of just partitioning them along the natural dimensions. There's really only one metric, number of spans received. We can partition it by span format, span submission protocol, etc. Having said that, spans received by service is also just a dimension of the above metric, but we pulled it separately because when we only need to plot the total traffic, we need to aggregate across a lot of dimensions. |
Codecov Report
@@ Coverage Diff @@
## master #1446 +/- ##
==========================================
+ Coverage 99.82% 99.82% +<.01%
==========================================
Files 173 173
Lines 8162 8179 +17
==========================================
+ Hits 8148 8165 +17
Misses 7 7
Partials 7 7
Continue to review full report at Codecov.
|
@yurishkuro Could you take a look at travis-ci? There are two tests I can't seem to pass somehow.. |
NB: please update the PR title to follow guidelines for commit messages. |
Signed-off-by: Jude Wang <[email protected]>
Signed-off-by: Jude Wang <[email protected]>
Signed-off-by: Jude Wang <[email protected]>
Signed-off-by: Jude Wang <[email protected]>
Signed-off-by: Jude Wang <[email protected]>
Signed-off-by: Jude Wang <[email protected]>
Signed-off-by: Jude Wang <[email protected]>
cmd/collector/app/metrics.go
Outdated
@@ -104,20 +109,27 @@ func newMetricsBySvc(factory metrics.Factory, category string) metricsBySvc { | |||
} | |||
|
|||
func newCountsBySvc(factory metrics.Factory, category string, maxServiceNames int) countsBySvc { | |||
// Add 3 to maxServiceNames threshold to compensate for extra slots taken by transport types | |||
maxServiceNames = maxServiceNames + 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is a good idea. I'm combining otherServices with transport types so this makes the user-defined counter space smaller. So I'm compensating 3 here to make the test pass. The difference is that when user have numOfServices > maxServiceNames
they might notice in a metric graph like grafana the real total number of serviceNames shows maxServiceNames + 3
@yurishkuro is there any way for me to restart crossdock test on travis. It's running fine on my local environment. |
crossdock is successful, the other tests are failing |
For the first one I checked unit test&code coverage under The second failure I ran Am I missing anything? |
There seems to be a deadlock in an ingester test.
And something related to Travis itself
I restarted both jobs. Let's see if it helps. |
hmmm. Interesting.. the second round passed and I ran some |
@jpkrohling @yurishkuro Can either of you pick up the review now that tests are passing? |
Signed-off-by: Jude Wang <[email protected]>
Signed-off-by: Jude Wang <[email protected]>
Signed-off-by: Jude Wang <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Which problem is this PR solving?
Resolves #1444
Short description of the changes