-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit sign of dropped spans #5557
Comments
Why logs? This feels like it should be a metric. |
@dmathieu |
@dmathieu Could you give a good example of how to do this? p.s. var meter = otel.Meter("go.opentelemetry.io/otel/sdk/trace")
func NewBatchSpanProcessor(exporter SpanExporter, options ...BatchSpanProcessorOption) SpanProcessor {
//...
dropCounter, err := meter.Int64Counter(
"drop.counter",
metric.WithDescription("Number of dropped spans."),
)
if err != nil {
panic(err) // what to do with error? And what error could be here?
}
bsp := &batchSpanProcessor{
//...,
dropCounter,
}
//...
}
func (bsp *batchSpanProcessor) enqueueDrop(ctx context.Context, sd ReadOnlySpan) bool {
if !sd.SpanContext().IsSampled() {
return false
}
select {
case bsp.queue <- sd:
return true
default:
atomic.AddUint32(&bsp.dropped, 1)
bsp.dropCounter.Add(ctx, 1)
}
return false
} |
@OrangeFlag FYI I have asked this question in |
I agree, the big thing there is to have a semantic convention to emit the metric on. Setting up a meter provider and emitting a metric is trivial, and done in multiple other places, for example in many contrib instrumentations. |
Sounds long, how can we potentially approach this? Perhaps it is worth making a public method for getting the number of dropped spans? Then users will be able to at least make their own metrics, while we are negotiating. |
I don't know if exposing the number of dropped spans is a good idea. That's exposing an implementation detail into the struct.
So anything listening for the logs can emit a metric whenever this is being emitted. |
In production, we can't use debug logs for all our applications, making it nearly impossible to understand when and which applications are silently dropping spans, especially since these applications are developed by other teams. |
Now it is possible to understand that processor dropped spans only by Debug log:
https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/batch_span_processor.go#L276
Which in itself is very doubtful, for example, count of dropped logs is written to Warn log:
https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/log/batch.go#L154
Proposed Solution
In any case, I would like to be able to observe the loss of spans/logs not only in log, but also, for example, in metrics.
We can export the global number of dropped spans/logs. This way users will be able to use this number for their triggers.
Alternatives
Should I use logs for such alerts?
At least we can write the number of lost spans with the Warn level.
The text was updated successfully, but these errors were encountered: