Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate monotonic counter metrics to u64_counter! #6350

Merged
merged 28 commits into from
Dec 3, 2024

Conversation

goto-bus-stop
Copy link
Member

This is part 1 out of... 4 or 5? of a series of work to move over to our new telemetry macros.

This part does the easiest part :), the tracing::info!(monotonic_counter.) macros that should now use u64_counter!(). I used a lot of commits so I could take notes while doing the work, I'll copy them to PR comments so there's no need to look at each commit individually.

I avoided breaking changes to the metrics for now, so this is targeted at 1.x.

Two uses of tracing::info!(monotonic_counter.) remain, these are already being addressed in #6338.


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

Notes:
- Fixed a typo in the not found attribute:
  `persisted_quieries.not_found` -> `persisted_queries.not_found`.
- Added description, it would be useful for someone to check it. It
  reads to me like *every* request is measured?
Notes:
- Removes the `apollo_router_deduplicated_subscriptions_total` metric.
  This is already captured by `apollo.router.operations.subscriptions`
  in the `subscriptions.deduplicated` attribute.
- The `apollo.router.operations.batching` metric appears to use an older
  style of attribute naming?
Notes:
- The description for `apollo_router_skipped_event_count` may not
  entirely be correct?
Notes:
- This combined a log message and a metric: now they are separate.
@goto-bus-stop goto-bus-stop requested review from a team as code owners November 27, 2024 14:05
@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Nov 27, 2024

✅ Docs Preview Ready

No new or changed pages found.

This comment has been minimized.

@router-perf
Copy link

router-perf bot commented Nov 27, 2024

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

@@ -210,7 +210,7 @@ where
Response: Send + 'static + Debug,
TransformedResponse: Send + 'static + Debug,
{
let query = query_name::<Query>();
let query_name = query_name::<Query>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify that it isn't the full query text.

Copy link
Contributor

@bnjjj bnjjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On metrics you marked as deprecated in description that could be interesting to also document it as deprecated in docs. We already have a deprecated section. I know it's also part of another ticket but we both work on deprecating different metrics so in order to not forget any of these deprecated metrics I think it's worth documenting it directly

apollo-router/src/notification.rs Outdated Show resolved Hide resolved
);
u64_counter!(
"apollo.router.operations.jwt",
"Number of requests with JWT authentication",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should not add authentication.jwt.failed = false in the future (for 2.0) to be consistent with what sigv4 is doing for example

apollo-router/src/plugins/authentication/subgraph.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/authentication/subgraph.rs Outdated Show resolved Hide resolved
apollo-router/src/services/router/service.rs Show resolved Hide resolved
apollo-router/src/services/router/service.rs Show resolved Hide resolved
apollo-router/src/services/router/service.rs Show resolved Hide resolved
@goto-bus-stop goto-bus-stop requested a review from a team as a code owner November 28, 2024 14:36
- `apollo_authentication_failure_count` - **Deprecated**: use the `apollo.router.operations.authentication.jwt` metric's `authentication.jwt.failed` attribute.
- `apollo_authentication_success_count` - **Deprecated**: use the `apollo.router.operations.authentication.jwt` metric instead. If the `authentication.jwt.failed` attribute is *absent* or `false`, the authentication succeeded.
- `apollo_require_authentication_failure_count` - **Deprecated**: TODO @goto-bus-stop: no replacement?
- `apollo_router_timeout` - **Deprecated**: this metric conflates timed-out requests from client to the router, and requests from the router to subgraphs. Timed-out requests have HTTP status code 504. Use the `http.response.status_code` attribute on the `http.server.request.duration` metric to identify timed-out router requests, and on the `http.client.request.duration` metric to identify timed-out subgraph requests.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify this. @BrynCooke shared a config for default_requirement_level: required, but that is the default, so I think this should work?

@goto-bus-stop
Copy link
Member Author

Un-deprecated apollo.router.graphql_error as the alternative is an enterprise feature currently.

The rest should be in order. @BrynCooke @bnjjj PTAL at apollo_require_authentication_failure_count and apollo_router_timeout.

@goto-bus-stop
Copy link
Member Author

@BrynCooke I see some comments in my email that I can't find on Github. I think the metrics I marked as deprecated here all have a free-to-use migration path in 1.x and have appropriate steps in the docs?

@goto-bus-stop goto-bus-stop enabled auto-merge (squash) December 3, 2024 14:50
@goto-bus-stop goto-bus-stop merged commit 89795a7 into dev Dec 3, 2024
13 checks passed
@goto-bus-stop goto-bus-stop deleted the renee/ROUTER-297-monotonic-counters branch December 3, 2024 15:12
@BrynCooke BrynCooke mentioned this pull request Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants