Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce request error type attribute #205

Merged
merged 13 commits into from
Sep 11, 2023

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Jul 24, 2023

Fixes #204

Proposal:

  • Reserve an attribute for the error code when response code was not received (or reading response body has failed) - error.type (TBD)
  • Start with a small, non-controversial set of error types common across HTTP clients in different languages. Don't try to unify common errors initially. Even things like timeout (vs cancellation) can be controversial.
  • Define 'other' as a dimension cap
  • Allow instrumentations to report custom (known, documented, low-cardinality) error types
  • If we see similarities in error types across clients/languages, we can consider adding common valuesit into the enum even after HTTP semconv goes stable

(naming and common error suggestions are welcome!)

Error rate queries

  • rate(http_request_duration_count{error_type!=""}[1m]) by error_type

@lmolkova
Copy link
Contributor Author

docs/http/http-metrics.md Outdated Show resolved Hide resolved
docs/http/http-metrics.md Outdated Show resolved Hide resolved
@noahfalk
Copy link
Contributor

This sounded fine to me, perhaps "http.request.error.reason" -> http.client.error.reason or http.error.reason?

What makes sense to me is that http.request.error.reason would encode errors that occurred locally and status_code covers errors that occurred remotely received via http response. This means all combinations of (error.reason is set or unset) x (status code is unset, status code is 200, status code is non-200) are possible. For span errors, I would guess you could have a span error if error.reason is set OR status_code == (whatever set of statuses are defined as errors).

model/http-common.yaml Outdated Show resolved Hide resolved
@mateuszrzeszutek
Copy link
Member

  • Span status is OK or UNSET -> no http.request.error.reason
  • Span status is ERROR -> http.request.error.reason is _OTHER by default, it might also be possible to extract it from span status description

This might be a thing for a separate issue, but in the Java world I know of several HTTP instrumentations (Reactor Netty, Spring Webflux) that have knowledge if the request is cancelled, but they don't set the span status to ERROR cause there is no exception thrown. Would we want to set the span status in that case? Or an attribute?

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to cover a major gap we have.

One thing I want to call out -

Some metric backends treat absence of an attribute as an "empty" attribute. As such, we should probably REQUIRE that the reason is never empty, but also allow the "empty" means there was no reason.

docs/http/http-metrics.md Outdated Show resolved Hide resolved
@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 27, 2023

This might be a thing for a separate issue, but in the Java world I know of several HTTP instrumentations (Reactor Netty, Spring Webflux) that have knowledge if the request is cancelled, but they don't set the span status to ERROR cause there is no exception thrown. Would we want to set the span status in that case? Or an attribute?

@mateuszrzeszutek great point!
I updated the condition so it accounts for no-exception cancellation.

I think span status UNSET and http.error.reason = canceled would do the trick?
Cancellation is not necessarily an error for the end user (if I intentionally cancel an operation that's not necessary anymore).
So perhaps no need to translate span status to the metric attribute since the attribute is defined for both?

@lmolkova lmolkova force-pushed the http-request-error-reason branch 3 times, most recently from e34d714 to 4fd9d38 Compare July 27, 2023 19:12
@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 27, 2023

Some metric backends treat absence of an attribute as an "empty" attribute. As such, we should probably REQUIRE that the reason is never empty, but also allow the "empty" means there was no reason.

@jsuereth
I added empty value to the enum, but don't see why we need to REQUIRE it to be present. If it's the case we also need it for the HTTP response status code that could be missing.

model/http-common.yaml Outdated Show resolved Hide resolved
@Oberon00
Copy link
Member

Oberon00 commented Aug 1, 2023

The PR title still contains only "metrics" which might make some people skip over it that would want to look at tracing-related PRs

@mateuszrzeszutek
Copy link
Member

I think span status UNSET and http.error.reason = canceled would do the trick?
Cancellation is not necessarily an error for the end user (if I intentionally cancel an operation that's not necessary anymore).

Yes, this makes sense to me. Thanks!

So perhaps no need to translate span status to the metric attribute since the attribute is defined for both?

👍

@Oberon00
Copy link
Member

Oberon00 commented Aug 1, 2023

I think a common list (at least as recommendation for the most common ones) of error categories is achievable and is actually required if you want to have low cardinality in large systems with multiple techs.

However, we should probably not come up with our own list but use an existing one, like https://en.cppreference.com/w/cpp/error/errc or https://pkg.go.dev/google.golang.org/grpc/codes

docs/http/http-metrics.md Outdated Show resolved Hide resolved
@lmolkova lmolkova changed the title Http metrics: introduce request error reason Introduce HTTP request error reason attribute Aug 4, 2023
…p of status code, use numerical status code representation
Copy link

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client part LGTM.

@arminru arminru merged commit 2bad9af into open-telemetry:main Sep 11, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP metrics: reporting additional error reason when response was not received