Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing full exception trace from 404 error #24195

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -230,16 +230,8 @@ private <T> Mono<T> traceEnabledPublisher(Mono<T> resultPublisher,
}
}).doOnError(throwable -> {
if (isEnabled() && !isNestedCall) {
Throwable unwrappedException = reactor.core.Exceptions.unwrap(throwable);
if (unwrappedException instanceof CosmosException) {
CosmosException dce = (CosmosException) unwrappedException;
try {
addDiagnosticsOnTracerEvent(dce.getDiagnostics(), parentContext.get());
} catch (JsonProcessingException ex) {
LOGGER.warn("Error while serializing diagnostics for tracer", ex.getMessage());
}
}

// not adding diagnostics on trace event for exception as this information is already there as
// part of exception message
this.endSpan(parentContext.get(), Signal.error(throwable), ERROR_CODE);
}
});
Expand Down Expand Up @@ -289,10 +281,18 @@ private <T> Mono<T> publisherWithClientTelemetry(Mono<T> resultPublisher,

private void end(int statusCode, Throwable throwable, Context context) {
if (throwable != null) {
tracer.setAttribute(TracerProvider.ERROR_MSG, throwable.getMessage(), context);
tracer.setAttribute(TracerProvider.ERROR_TYPE, throwable.getClass().getName(), context);
if (statusCode == HttpConstants.StatusCodes.NOTFOUND) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we do the same thing for 409s (ResourceAlreadyExists) as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And 412

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked through all these and decided 404 is more regular business scenario then others. We will wait and watch for other error on customer demand basis

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only remove the callstack etc. for 404 with Substatusode 0 - for any SubStatusCode != 0 (like ReadSessionNotAvailable 1002 etc.) we should leave the callstack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea sure will add substatus check as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added 404/0 check

tracer.setAttribute(TracerProvider.ERROR_MSG, "Not found exception", context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 404 statusCode is enough and the core tracer will populate all the info needed.

https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core-tracing-opentelemetry/src/main/java/com/azure/core/tracing/opentelemetry/implementation/HttpTraceUtil.java#L179

Can we avoid populating custom error codes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove ERROR_TYPE/ERROR_MSG then we wont be seeing below attributes which was the initial requirement from @trask
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is no longer an exception, it shouldn't have exception attributes - otel.status_description gives user the info they need

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove ERROR_TYPE/ERROR_MSG then we wont be seeing below attributes which was the initial requirement from @trask

this was probably before I understood the otel semantic convention for exceptions 😅

@lmolkova is correct that recordException() is the right way to attach the exception data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed error.msg and error.type

tracer.setAttribute(TracerProvider.ERROR_TYPE, throwable.getClass().getName(), context);
tracer.end(statusCode, null, context);
} else {
tracer.setAttribute(TracerProvider.ERROR_MSG, throwable.getMessage(), context);
Copy link
Member

@lmolkova lmolkova Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove ERROR_TYPE/ERROR_MSG then we wont be seeing below attributes which was the initial requirement from @trask
image

Copy link
Member

@lmolkova lmolkova Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add screenshot for non-404 case? There will be standard otel attributes recorded for exception (exception.type and exceptions.message)?

@trask I believe you wanted to have otel semantic exception attributes attached, they have changed and now we are populating new ones + old ones. Are you ok with cosmos removing error.msg and error.type? Do you believe we use them anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think we should remove those, I had similar thought a few weeks ago but got distracted from opening PR 😓: trask@af44c69

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trask Removed error.msg and error.type.
@lmolkova please see the screen shot when it is not 404/0 (item or collection not found). We cannot differentiate between item and collection not found without parsing cosmos exception diagnostics and it will be error prone in future.
However if the customer logging the exception they can still see the full diagnostics and differentiate

image

tracer.setAttribute(TracerProvider.ERROR_TYPE, throwable.getClass().getName(), context);
tracer.end(statusCode, throwable, context);
}
} else {
tracer.end(statusCode, null, context);
}
tracer.end(statusCode, throwable, context);
}

private void fillClientTelemetry(CosmosAsyncClient cosmosAsyncClient,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,15 +184,8 @@ private Flux<FeedResponse<T>> byPage(CosmosPagedFluxOptions pagedFluxOptions, Co
Configs.isClientTelemetryEnabled(BridgeInternal.isClientTelemetryEnabled(pagedFluxOptions.getCosmosAsyncClient())) &&
throwable instanceof CosmosException) {
CosmosException cosmosException = (CosmosException) throwable;
if (isTracerEnabled(pagedFluxOptions) && this.cosmosDiagnosticsAccessor.isDiagnosticsCapturedInPagedFlux(cosmosException.getDiagnostics()).compareAndSet(false, true)) {
try {
addDiagnosticsOnTracerEvent(pagedFluxOptions.getTracerProvider(),
cosmosException.getDiagnostics(), parentContext.get());
} catch (JsonProcessingException ex) {
LOGGER.warn("Error while serializing diagnostics for tracer", ex.getMessage());
}
}

// not adding diagnostics on trace event for exception as this information is already there as
// part of exception message
if (this.cosmosDiagnosticsAccessor.isDiagnosticsCapturedInPagedFlux(cosmosException.getDiagnostics()).compareAndSet(false, true)) {
fillClientTelemetry(pagedFluxOptions.getCosmosAsyncClient(), 0, pagedFluxOptions.getContainerId(),
pagedFluxOptions.getDatabaseId(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -439,20 +439,20 @@ public void tracerExceptionSpan() throws Exception {
traceApiCounter++;

String errorType = null;
CosmosDiagnostics cosmosDiagnostics = null;
try {
PartitionKey partitionKey = new PartitionKey("wrongPk");
cosmosAsyncContainer.readItem("testDoc", partitionKey, null, InternalObjectNode.class).block();
fail("readItem should fail due to wrong pk");
} catch (CosmosException ex) {
assertThat(ex.getStatusCode()).isEqualTo(HttpConstants.StatusCodes.NOTFOUND);
errorType = ex.getClass().getName();
cosmosDiagnostics = ex.getDiagnostics();
}

verifyTracerAttributes(tracerProvider, mockTracer, "readItem." + cosmosAsyncContainer.getId(), context,
cosmosAsyncDatabase.getId(), traceApiCounter
, errorType, cosmosDiagnostics, attributesMap);
, errorType, null, attributesMap);
// sending null diagnostics as we don't want diagnostics in events for exception as this information is
// already there as part of exception message
}

@AfterClass(groups = {"emulator"}, timeOut = SETUP_TIMEOUT)
Expand Down