-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanups to OpenTelemetry tracing. #9
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,9 +78,7 @@ public void addEvent(String name, Instant timestamp) { | |
|
||
@Override | ||
public void end() { | ||
try (Scope scope = span.makeCurrent()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So if the span is completed on a different thread than it was created, it doesn't need to made current again? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not just to call If in this method, if we had a call like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok that makes sense, thanks |
||
span.end(); | ||
} | ||
span.end(); | ||
} | ||
|
||
@Override | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ | |
|
||
import com.couchbase.client.core.cnc.RequestSpan; | ||
import com.couchbase.client.core.cnc.RequestTracer; | ||
import io.opentelemetry.api.OpenTelemetry; | ||
import io.opentelemetry.api.trace.Span; | ||
import io.opentelemetry.api.trace.SpanBuilder; | ||
import io.opentelemetry.api.trace.Tracer; | ||
|
@@ -33,23 +34,23 @@ | |
public class OpenTelemetryRequestTracer implements RequestTracer { | ||
|
||
/** | ||
* Holds the actual OT tracer. | ||
* Holds the actual OTel tracer. | ||
*/ | ||
private final Tracer tracer; | ||
|
||
/** | ||
* Wraps the OpenTelemetry tracer and returns a datatype that can be passed into the requestTracer method of the | ||
* Wraps OpenTelemetry and returns a datatype that can be passed into the requestTracer method of the | ||
* environment. | ||
* | ||
* @param tracer the tracer to wrap. | ||
* @return the wrapped tracer ready to be passed in. | ||
* @param openTelemetry the OpenTelemetry instance to wrap. | ||
* @return the wrapped OpenTelemetry ready to be passed in. | ||
*/ | ||
public static OpenTelemetryRequestTracer wrap(final Tracer tracer) { | ||
return new OpenTelemetryRequestTracer(tracer); | ||
public static OpenTelemetryRequestTracer wrap(final OpenTelemetry openTelemetry) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was always under the impression (maybe this is legacy knowledge from OpenTracing) that there is only one global tracer for each app, so that we would just whatever tracer the user has configured. Does this mean that with Otel you'll have one tracer instance for each library instrumented? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup OTel is a bit different, the "Tracer" from OpenTracing is called "TracerProvider" in OTel. Each library indeed has its own Tracer with information about that library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, that makes sense then! |
||
return new OpenTelemetryRequestTracer(openTelemetry); | ||
} | ||
|
||
private OpenTelemetryRequestTracer(Tracer tracer) { | ||
this.tracer = tracer; | ||
private OpenTelemetryRequestTracer(OpenTelemetry openTelemetry) { | ||
this.tracer = openTelemetry.getTracer("com.couchbase.client.jvm"); | ||
} | ||
|
||
private Span castSpan(final RequestSpan requestSpan) { | ||
|
@@ -68,19 +69,17 @@ private Span castSpan(final RequestSpan requestSpan) { | |
* Returns the inner OpenTelemetry tracer. | ||
*/ | ||
public Tracer tracer() { | ||
return tracer; | ||
return tracer; | ||
} | ||
|
||
@Override | ||
public RequestSpan requestSpan(String operationName, RequestSpan parent) { | ||
SpanBuilder spanBuilder = tracer.spanBuilder(operationName); | ||
Context parentContext = Context.current(); | ||
if (parent != null) { | ||
spanBuilder.setParent(Context.current().with(castSpan(parent))); | ||
} else { | ||
spanBuilder.setNoParent(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually one more question here too: I assume you are refering to the implicit propagation through active and thread locals? What about where we are working under async / reactive environments where the one in the thread local might not actually be the "real" parent? Btw, not sure if you noticed, we actually have an explicit propagation mechanism. On every request the user makes, in the option block, you can provide the parent explicitly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah for async environments, we still expect the context to be propagated through callbacks, for example wiring up using something like
Or registering the executor in a dagger graph, or as a reactive subscriber, or maybe use reactive hooks via some instrumentation like https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/reactor-3.1/library, or using the java agent which also does automatic context propagation in many cases. The reality is very few libraries support explicitly passing in a context like this couchbase one does, so propagating the context into threadlocals across async bounds is basically a requirement in these environments, e.g. if you're using gRPC or really any other library you need to be propagating context. On the bright side, this means that a couchbase user doesn't have to worry about couchbase specifically, they can treat it as any other client in their app and expect it to use the implicit parent when there's no explicit one. Of course, it's up to you whether this is something you want to support - on the flip side, you could just have a couchbase sdk policy saying "always pass in the parent, we don't use implicit propagation at all". I wouldn't recommend it though since it'd be a jarring experience for users compared to other libraries. |
||
parentContext = parentContext.with(castSpan(parent)); | ||
} | ||
Span span = spanBuilder.startSpan(); | ||
span.makeCurrent().close(); | ||
Span span = spanBuilder.setParent(parentContext).startSpan(); | ||
return OpenTelemetryRequestSpan.wrap(span); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we send SpanKind.CLIENT on all of the couchbase-related spans that are created in the SDK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i.e. also on the dispatch_to_server)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're not really allowed to have nested
CLIENT
spans because the definition in the spec says a singleCLIENT
span corresponds to a singleSERVER
span. In our instrumentation, we generally suppress transport spans, so for example in AWS SDK instrumentation, we generate aCLIENT
span for the SDK call, but no span for the Apache HTTP Client request itself. In this case, I think that would be similar to suppressing thedispatch_to_server
span. However, it's not entirely bad modeling to havedispatch_to_server
as anINTERNAL
span either, but we'd generally only expect the "API Method" to be aCLIENT
span - imagine it as the name on client and server should be very similar. Ideally some day Couchbase itself has OpenTelemetry server instrumentation too and would have aget
span with typeSERVER
:)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question: how do you deal with "multiple" requests? So for example we have "get from replicas" which will internally send N requests to the active and all replicas. Would the outer parent span that covers all the individual requests also be a CLIENT? or really only ever the individual requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't really settled on a great way to model this, this spec issue is related.
open-telemetry/opentelemetry-specification#1360
If one of the replicas fails, does the entire request fail? I suppose not since we're expecting one replica to succeed - in that case I'd consider this to be a single CLIENT span, how many requests are made in the background is still an implementation detail. Tracing backends generally assume CLIENT span failures are indicative of a problem so it's good for those to be 1:1 if possible.
It can still be important for debugging though, so one idea we've had while discussing that issue is possibly modeling as events, modeling as INTERNAL spans, or maybe allowing nested CLIENT spans but we haven't come to a great conclusion yet unfortunately.