Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing full exception trace from 404 error #24195

Conversation

simplynaveen20
Copy link
Member

This PR contains below changes

  1. Removed the full error stack from customer tracer on 404 error as it consider a regular business scenario , this will reduce noise and cost for customer in App Insight/Reporters
  2. Also removing explicit diagnostics information from span events for all exceptions which was added in Integrating cosmos diagnostics with open telemetry tracer #22202 , complete diagnostics are already captured in exception event

Before the change
image
image

After the change
image

Copy link
Contributor

@moderakh moderakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we do the same thing for 409s (ResourceAlreadyExists) as well?

other than that LGTM. thanks @simplynaveen20

@@ -289,10 +281,18 @@ public void addEvent(String name, Map<String, Object> attributes, OffsetDateTime

private void end(int statusCode, Throwable throwable, Context context) {
if (throwable != null) {
tracer.setAttribute(TracerProvider.ERROR_MSG, throwable.getMessage(), context);
tracer.setAttribute(TracerProvider.ERROR_TYPE, throwable.getClass().getName(), context);
if (statusCode == HttpConstants.StatusCodes.NOTFOUND) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we do the same thing for 409s (ResourceAlreadyExists) as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And 412

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked through all these and decided 404 is more regular business scenario then others. We will wait and watch for other error on customer demand basis

@@ -289,10 +281,18 @@ public void addEvent(String name, Map<String, Object> attributes, OffsetDateTime

private void end(int statusCode, Throwable throwable, Context context) {
if (throwable != null) {
tracer.setAttribute(TracerProvider.ERROR_MSG, throwable.getMessage(), context);
tracer.setAttribute(TracerProvider.ERROR_TYPE, throwable.getClass().getName(), context);
if (statusCode == HttpConstants.StatusCodes.NOTFOUND) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only remove the callstack etc. for 404 with Substatusode 0 - for any SubStatusCode != 0 (like ReadSessionNotAvailable 1002 etc.) we should leave the callstack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea sure will add substatus check as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added 404/0 check

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the 404 with SubStatusCode != 0

tracer.setAttribute(TracerProvider.ERROR_TYPE, throwable.getClass().getName(), context);
tracer.end(statusCode, null, context);
} else {
tracer.setAttribute(TracerProvider.ERROR_MSG, throwable.getMessage(), context);
Copy link
Member

@lmolkova lmolkova Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove ERROR_TYPE/ERROR_MSG then we wont be seeing below attributes which was the initial requirement from @trask
image

Copy link
Member

@lmolkova lmolkova Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add screenshot for non-404 case? There will be standard otel attributes recorded for exception (exception.type and exceptions.message)?

@trask I believe you wanted to have otel semantic exception attributes attached, they have changed and now we are populating new ones + old ones. Are you ok with cosmos removing error.msg and error.type? Do you believe we use them anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think we should remove those, I had similar thought a few weeks ago but got distracted from opening PR 😓: trask@af44c69

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trask Removed error.msg and error.type.
@lmolkova please see the screen shot when it is not 404/0 (item or collection not found). We cannot differentiate between item and collection not found without parsing cosmos exception diagnostics and it will be error prone in future.
However if the customer logging the exception they can still see the full diagnostics and differentiate

image

tracer.setAttribute(TracerProvider.ERROR_MSG, throwable.getMessage(), context);
tracer.setAttribute(TracerProvider.ERROR_TYPE, throwable.getClass().getName(), context);
if (statusCode == HttpConstants.StatusCodes.NOTFOUND) {
tracer.setAttribute(TracerProvider.ERROR_MSG, "Not found exception", context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 404 statusCode is enough and the core tracer will populate all the info needed.

https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core-tracing-opentelemetry/src/main/java/com/azure/core/tracing/opentelemetry/implementation/HttpTraceUtil.java#L179

Can we avoid populating custom error codes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove ERROR_TYPE/ERROR_MSG then we wont be seeing below attributes which was the initial requirement from @trask
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is no longer an exception, it shouldn't have exception attributes - otel.status_description gives user the info they need

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove ERROR_TYPE/ERROR_MSG then we wont be seeing below attributes which was the initial requirement from @trask

this was probably before I understood the otel semantic convention for exceptions 😅

@lmolkova is correct that recordException() is the right way to attach the exception data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed error.msg and error.type

@simplynaveen20 simplynaveen20 merged commit 240d0f3 into Azure:main Sep 22, 2021
@lmolkova lmolkova mentioned this pull request Nov 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants