Skip to content

Commit

Permalink
Update to otel 0.20.0 (#3649)
Browse files Browse the repository at this point in the history
Updates Otel to 0.20

Otel 0.20 had a major change in the way that metrics were implemented
resulting in us having to do a complete overhaul on our integration.

Problems include:
* Some metrics traits were converted to structs, preventing the use of
our wrappers.
* It is no longer possible to migrate metrics from one prometheus
registry to another. This resulted in a significant amount of
restructuring and special handling to allow prometheus meter provider to
remain across reloads.
* Related to the above many tests were broken due to their reliance on
prometheus and our lack of testability.

Unfortunately the above meant a significant amount of rework just to
retain the equivalent tests that we had before.

Some dev docs have been created to try and give a higher level overview
of how things fit together:

https://github.com/apollographql/router/blob/bryn/otel-update/dev-docs/metrics.md

Perf looks like it hasn't regressed.


<!-- start metadata -->

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [ ] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]. It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding
Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`

---------

Signed-off-by: Benjamin Coenen <[email protected]>
Co-authored-by: bryn <[email protected]>
Co-authored-by: Coenen Benjamin <[email protected]>
  • Loading branch information
3 people authored Sep 27, 2023
1 parent f5a1145 commit bca9d86
Show file tree
Hide file tree
Showing 35 changed files with 3,550 additions and 1,824 deletions.
20 changes: 20 additions & 0 deletions .changesets/maint_bryn_otel_update.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
### Update to OpenTelemetry 0.20.0 ([PR #3649](https://github.com/apollographql/router/pull/3649))

The router now uses OpenTelemetry 0.20.0. This includes a number of fixes and improvements from upstream.

In particular metrics have some significant changes:
* Prometheus metrics are now aligned with the [OpenTelemetry spec](https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/), and will not report `service_name` on each individual metric. Resource attributes are now moved to a single `target_info` metric.

Users should check that their dashboards and alerts are properly configured when upgrading.

* The default service name for metrics is now `unknown_service` as per the [OpenTelemetry spec](https://opentelemetry.io/docs/concepts/sdk-configuration/general-sdk-configuration/#otel_service_name).

Users should ensure to configure service name via router.yaml, or via the `OTEL_SERVICE_NAME` environment variable.

* The order of priority for setting service name has been brought into line with the rest of the router configuration. The order of priority is now:
1. `OTEL_RESOURCE_ATTRIBUTES` environment variable
2. `OTEL_SERVICE_NAME` environment variable
3. `resource_attributes` in router.yaml
4. `service_name` in router.yaml

By [@BrynCooke](https://github.com/BrynCooke) in https://github.com/apollographql/router/pull/3649
Loading

0 comments on commit bca9d86

Please sign in to comment.