You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two ways to convert errors.*.extensions.code into a metric attribute:
On non-20x responses, use telemetry.metrics.common.attributes.{supergraph,subgraph}.errors.
On 20x responses, use telemetry.metrics.common.attributes.{supergraph,subgraph}.response.body.
The latter isn't specific to errors, and the JSON path feature does not handle arrays, so you have to pick a single error per attribute:
telemetry:
metrics:
common:
attributes:
supergraph: # Attribute configuration for requests to/responses from the routerresponse:
body:
# Apply the value of the provided path of the router's response body as an attribute
- path: .errors[0].extensions.statusname: error_from_body
The combination of these conditions makes it difficult to create a single metric that counts the occurrence of a type of error that I can use to build alerts.
Describe the solution you'd like
I think my ideal solution is that telemetry.metrics.common.attributes.{supergraph,subgraph}.errors works with 20x responses. Maybe there's an optional configuration property to opt-in to this behavior.
This will result in one metric for 20x responses and one for non-20x responses but at least we'd be able to capture all the errors in a 20x response.
Additional context
Subgraphs that conform to the draft "GraphQL over HTTP" spec should use 200 OK for responses that include top-level errors: https://graphql.github.io/graphql-over-http/draft/#sec-application-json. We want the ability to capture top-level error rates in these situations.
The text was updated successfully, but these errors were encountered:
The customer goal is to capture error rates, often categorized by an arbitrary
code
field in the error extensions:There are two ways to convert
errors.*.extensions.code
into a metric attribute:telemetry.metrics.common.attributes.{supergraph,subgraph}.errors
.telemetry.metrics.common.attributes.{supergraph,subgraph}.response.body
.The latter isn't specific to errors, and the JSON path feature does not handle arrays, so you have to pick a single error per attribute:
The combination of these conditions makes it difficult to create a single metric that counts the occurrence of a type of error that I can use to build alerts.
Describe the solution you'd like
I think my ideal solution is that
telemetry.metrics.common.attributes.{supergraph,subgraph}.errors
works with 20x responses. Maybe there's an optional configuration property to opt-in to this behavior.Describe alternatives you've considered
Alternatively, we could add an
errors
property next tobody
for generating attributes from the response.This will result in one metric for 20x responses and one for non-20x responses but at least we'd be able to capture all the errors in a 20x response.
Additional context
Subgraphs that conform to the draft "GraphQL over HTTP" spec should use
200 OK
for responses that include top-level errors: https://graphql.github.io/graphql-over-http/draft/#sec-application-json. We want the ability to capture top-level error rates in these situations.The text was updated successfully, but these errors were encountered: