Distinguish which subgraph threw error in prometheus metrics #1493

KennethWussmann · 2022-08-10T14:38:22Z

Is your feature request related to a problem? Please describe.
We're coming from #1198 with a similar problem. When we apply the custom attributes of the returned errors they get added to the Prometheus metrics, but it's not possible to tell which subgraph threw the error.

Given configuration:

telemetry:
  metrics:
    prometheus:
      enabled: true
    common:
      attributes:
        router:
          response:
            body:
              - path: .errors[0].extensions.code
                name: error_code

Will create the following Prometheus Metric:

http_request_duration_seconds_bucket{error_code="BAD_USER_INPUT", le="+Inf", namespace="graphql", service_name="apollo-router", status="200"}

The metric now contains the error code as configured, but from the attributes it's not possible to tell which subgraph nor operation caused this issue. That would be essential information to work with this metric.

As an additional plus: This was mentioned in the other issue as well. .errors[0].extensions.code only retrieves the first error, allowing the JSON path syntax .errors[*].extensions.code to get them all would be nice to have as well.

Describe the solution you'd like
It would be great to have the possibility to include the subgraph and operation name into this metric.

Describe alternatives you've considered
We also tried the following config:

telemetry:
  metrics:
    prometheus:
      enabled: true
    common:
      attributes:
        subgraph: 
          all:
            static:
              - name: kind
                value: subgraph_request
            errors:
              include_messages: true
              extensions:
                - name: subgraph_error_code
                  path: .code

But we later found out this config only works for subgraph errors that are not returning with HTTP status 200 or when the subgraph can't be reached. For client errors like BAD_USER_INPUT this would not be the case and this config therefore does not work.

The text was updated successfully, but these errors were encountered:

bnjjj · 2022-08-10T15:24:15Z

Did you try with this configuration ?

telemetry:
  metrics:
    prometheus:
      enabled: true
    common:
      attributes:
        subgraph: 
          all:
            static:
              - name: kind
                value: subgraph_request
            response:
              body:
                - path: .errors[0].extensions.code
                  name: subgraph_error_code
            errors:
              extensions:
                - name: subgraph_error_code
                  path: .code

KennethWussmann · 2022-08-10T15:49:12Z

Hey @bnjjj, thanks for the quick answer.
Yes indeed that does what we were looking for. We didn't knew about this configuration option [from looking at the docs}(https://www.apollographql.com/docs/router/configuration/metrics#adding-custom-attributeslabels).

bnjjj · 2022-08-11T09:48:28Z

TLDR: when the http status code from subgraph != 200 you need to specify what do you want in error section, if it's a graphql error with an http status code == 200 then you need to specify configuration in response. There aren't exclusive at all you can configure both like in this example

KennethWussmann added raised by user triage labels Aug 10, 2022

KennethWussmann closed this as completed Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguish which subgraph threw error in prometheus metrics #1493

Distinguish which subgraph threw error in prometheus metrics #1493

KennethWussmann commented Aug 10, 2022 •

edited

Loading

bnjjj commented Aug 10, 2022 •

edited

Loading

KennethWussmann commented Aug 10, 2022

bnjjj commented Aug 11, 2022

Distinguish which subgraph threw error in prometheus metrics #1493

Distinguish which subgraph threw error in prometheus metrics #1493

Comments

KennethWussmann commented Aug 10, 2022 • edited Loading

bnjjj commented Aug 10, 2022 • edited Loading

KennethWussmann commented Aug 10, 2022

bnjjj commented Aug 11, 2022

KennethWussmann commented Aug 10, 2022 •

edited

Loading

bnjjj commented Aug 10, 2022 •

edited

Loading