-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics from error body #1198
Comments
I recently added feature to add custom attributes to metrics and I think we could extend this feature by adding an entry named telemetry:
metrics:
common:
attributes:
static:
- name: "version"
value: "v1.0.0"
from_headers:
- named: "content-type"
rename: "payload_type"
default: "application/json"
- named: "x-custom-header-to-add"
from_response:
- path: errors.extensions.status
rename: extended_status
default: optional_default_value |
Regarding your example I think I also forgot to add another configuration specific to subgraphs: telemetry:
metrics:
common:
attributes:
static:
- name: "version"
value: "v1.0.0"
from_headers:
- named: "content-type"
rename: "payload_type"
default: "application/json"
- named: "x-custom-header-to-add"
from_response:
- path: errors.extensions.status
rename: extended_status
default: optional_default_value
subgraphs:
my_subgraph:
attributes:
from_response:
- path: errors.extensions.status
rename: extended_status
default: optional_default_value |
So the metrics would look like this?
I think that would work! |
I haven't fully investigated what we have in terms of options here or the Further, headers are classically part of a request and response (i.e., It might make sense to refer to the GraphQL response (i.e., I'll just bikeshed some ideas here around structure, but do we think something like this structure adds clarity? from_upstream_subgraph:
request:
header:
- named: "content-type"
rename: "payload_type"
default: "application/json"
- named: "x-custom-header-to-add"
graphql:
- path: errors.extensions.status
rename: extended_status
default: optional_default_value
from_downstream_client:
# Do we need a section like this? |
Could the metrics configuration align with the four services (Router, Execution, QueryPlanner, Subgraph)? Router metrics would be "downstream" client request/response metrics, while Subgraph metrics would be "upstream" subgraph request/response metrics. |
We're going to have to decide how much we want the telemetry plugin to do in terms of collecting data. It's worth noting that at the point where we update the metrics, the content of the subgraph responses are not available to us by default right now. Here is a suggestion that hopefully covers most cases, but could also be used with rhai in the case that users want to get something from the subgraphs. They would be required to populate context. telemetry:
metrics:
common:
attributes:
static:
- name: "version"
value: "v1.0.0"
request:
- header:
named: "content-type"
rename: "payload_type"
default: "application/json"
- header:
named: "x-custom-header-to-add"
response:
- body:
path: errors.extensions.status
name: extended_status
default: optional_default_value
context:
- named: "foo"
default: "application/json" |
We've settled on the following config: telemetry:
metrics:
common:
attributes:
router:
static:
- name: "version"
value: "v1.0.0"
request:
- header:
named: "content-type"
rename: "payload_type"
default: "application/json"
- header:
named: "x-custom-header-to-add"
response:
- body:
path: errors.extensions.status
name: extended_status
default: optional_default_value
context:
- named: "foo"
subgraph:
all:
static:
- name: "version"
value: "v1.0.0"
request:
- header:
named: "content-type"
rename: "payload_type"
default: "application/json"
- header:
named: "x-custom-header-to-add"
response:
- body:
path: errors.extensions.status
name: extended_status
default: optional_default_value
context:
- named: "foo"
subgraphs:
products:
static:
- name: "version"
value: "v1.0.0"
request:
- header:
named: "content-type"
rename: "payload_type"
default: "application/json"
- header:
named: "x-custom-header-to-add"
response:
- body:
path: errors.extensions.status
name: extended_status
default: optional_default_value
context:
- named: "foo" |
Thanks a lot for this feature. I'm trying to differentiate between these two errors (one subgraph returning error vs two subgraphs returning error) My prometheus configuration
router.response.body reads the first error properly if there is one. something like this:
|
@ramapalani Ok I think there are 2 different things we're missing here. First one could be to add support for json query syntax like this What is missing in our implementation and could be useful for you is to add an telemetry:
metrics:
prometheus:
enabled: true
common:
attributes:
subgraph:
all:
response:
body:
- path: .errors[0].extensions.type
name: subgraph_error_extended_type you could have : telemetry:
metrics:
prometheus:
enabled: true
common:
attributes:
subgraph:
all:
error: # Only work if it's a valid GraphQL error
include_message: true # Will include the error message in a message attribute
include_locations: true
include_path: true
extensions: # Include extension data
name: subgraph_error_extended_type # Name of the attribute
path: .type # JSON query path to fetch data from extensions And by doing this, using your second example file you would have metrics like this:
|
GraphQL HTTP responses are typically 200 OK regardless of whether an error occurred. Generic APMs and monitoring tools don't have a way to distinguish between
{ data }
and{ data: null, errors: [...] }
for error rate detection.It's also common to add metadata to errors to replicate HTTP status code semantics like this:
The current Prometheus metrics provides counters for high-level HTTP status:
Describe the solution you'd like
A customer has asked for counters based on data from within the error extensions.
With config similar to:
It would be great to get metrics like:
(Note that the last metric in that example is the response to the fetch to the subgraph.)
Describe alternatives you've considered
Open to ideas!
Additional context
This is an enterprise customer request.
The text was updated successfully, but these errors were encountered: