-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add storage metrics to OTEL, metrics by span service name #2431
Conversation
Changing to WIP, I will do some cleanup on Monday. |
@joe-elliott could you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some questions/comments.
It looks like all exporters except for ES and Jaeger are wrapped in the SpanWriterExporter. I suppose we don't count exporting to Jaeger as "storage" so it should be excluded from this metric?
For my understanding, why does ES not use the SpanWriterExporter like the others?
numErrors := 0 | ||
for i, d := range blk.Items { | ||
bulkOp := operationToSpan[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how confident are we that len(operationToSpan) == len(blk.Items)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the lenght should match. The blk contains resposes for objects in operationToSpan.
if !bulkOp.isService { | ||
ctx, _ := tag.New(ctx, | ||
tag.Insert(storagemetrics.TagServiceName(), bulkOp.span.Process.ServiceName), w.nameTag) | ||
stats.Record(ctx, storagemetrics.StatSpansNotStoredCount().M(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would probably be faster to aggregate the number of failed spans in a local int and then Record
them all at once. I believe you'll need to change the view aggregation below to .Sum
if we did this we'd either have to keep a map of ServiceName => ErrorCount or drop the service name tag. Service name could be useful on a metric like this, but I don't think it's vital.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed ti to use two maps.
+1 the aggregation had to be changed to .Sum
if we did this we'd either have to keep a map of ServiceName => ErrorCount or drop the service name tag. Service name could be useful on a metric like this, but I don't think it's vital.
If we drop the service name we can drop this metic altogether as there is a generic metric per successful/failed exporter sends
otelcol_exporter_sent_spans{exporter="jaeger_memory",service_instance_id="23e9c69b-1383-4fbd-8614-638339329a8e"} 9
err := s.Writer.WriteSpan(span) | ||
if err != nil { | ||
errs = append(errs, err) | ||
dropped++ | ||
stats.Record(ctx, storagemetrics.StatSpansNotStoredCount().M(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar thoughts here on aggregating and then storing the metrics all at once. concerned mainly about allocations and locking per Record
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, we would have to calculate map[string]int
(serviceName->count) to do that. Now sure which one is better.
The ES use "native" OTEL implementation. The |
Jaeger is gRPC exporter not storage. The aim of this PR is to provide a metric that would show number of spans stored per service. |
a5927ec
to
576110f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me.
Signed-off-by: Pavol Loffay <[email protected]>
1b0d768
to
ba93e8e
Compare
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
|
||
var _ spanstore.Writer = (*storageWrapper)(nil) | ||
|
||
func (s storageWrapper) WriteSpan(ctx context.Context, span *model.Span) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the storage factory has a similar impl I have used that instead.
Signed-off-by: Pavol Loffay <[email protected]>
codecov seems to be flaky :/ |
Resolves #2165 (comment)
This PR adds storage metrics to OTEL distribution. The metrics show many spans were stored per service and exporter. There is a similar metric in the current jaeger distributions.
It is a similar metric to the core metric:
however it does not split the metric by reported service name.