Skip to content

Commit

Permalink
otel 0.19.0 second try (#3421)
Browse files Browse the repository at this point in the history
The update requires a change to the implementation and test update as
follows:

- In otel 0.18.0, processor factories had a `with_memory(bool)` method
which we were using when building our prometheus exporter. AFAICT, this
used to be a mechanism for controlling how metrics handled stale gauges.
In 0.19.0, [this method was
removed](open-telemetry/opentelemetry-rust#946)
and now gauges are all assumed to be as though they were created with
`false`. We had been providing `true` on our call. I'm not 100% certain
of the impact of this change, but it appears that we can ignore it. We
may need to consider it more carefully if problems arise.
- There are now two standard OTEL attributes:
```otel_scope_name="apollo/router",otel_scope_version=""``` added to
output and a number of tests had to be updated to accommodate that
change.
- One of our tests appeared to be searching for
`apollo_router_cache_hit_count` (and this was working) when it should
have been searching for `apollo_router_cache_hit_count_total` (likewise
for miss). I've updated the test and think this is the correct thing to
do. It looks like a bug was fixed in otel and this change matches the
fix.
 
Regarding that last point. The prometheus spec mandates naming format
and the change was part of the compliance with that spec. This PR made
the change:
open-telemetry/opentelemetry-rust#952

The two affected counters in the router were:

apollo_router_cache_hit_count -> apollo_router_cache_hit_count_total
apollo_router_cache_miss_count -> apollo_router_cache_miss_count_total

It's good that our prometheus metrics are now spec compliant, but we
should note this in the release notes and (if possible) somewhere in our
documentation. I'll add it to the changeset at least.

The upgrade fixes many of the outstanding issues related to
opentelemetry and various APM vendors:

Fixes: #2878
Fixes: #2066 
Fixes: #2959 
Fixes: #2225 
Fixes: #1520 

<!-- start metadata -->

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [x] Changes are compatible[^1]
- [x] Documentation[^2] completed
- [x] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [x] Unit Tests
    - [x] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]. It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding
Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`
  • Loading branch information
garypen authored Jul 12, 2023
1 parent f494bfe commit d3f37cd
Show file tree
Hide file tree
Showing 8 changed files with 110 additions and 75 deletions.
31 changes: 31 additions & 0 deletions .changesets/maint_garypen_2878_otel_0_19_0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
### update opentelemetry to 0.19.0 ([Issue #2878](https://github.com/apollographql/router/issues/2878))


We've updated the following opentelemetry related crates:

```
opentelemetry 0.18.0 -> 0.19.0
opentelemetry-datadog 0.6.0 -> 0.7.0
opentelemetry-http 0.7.0 -> 0.8.0
opentelemetry-jaeger 0.17.0 -> 0.18.0
opentelemetry-otlp 0.11.0 -> 0.12.0
opentelemetry-semantic-conventions 0.10.0 -> 0.11.0
opentelemetry-zipkin 0.16.0 -> 0.17.0
opentelemetry-prometheus 0.11.0 -> 0.12.0
tracing-opentelemetry 0.18.0 -> 0.19.0
```

This allows us to close a number of opentelemetry related issues.

Note:

The prometheus specification mandates naming format and, unfortunately, the router had two metrics which weren't compliant. The otel upgrade enforces the specification, so the affected metrics are now renamed (see below).

The two affected metrics in the router were:

apollo_router_cache_hit_count -> apollo_router_cache_hit_count_total
apollo_router_cache_miss_count -> apollo_router_cache_miss_count_total

If you are monitoring these metrics via prometheus, please update your dashboards with this name change.

By [@garypen](https://github.com/garypen) in https://github.com/apollographql/router/pull/3421
65 changes: 32 additions & 33 deletions Cargo.lock
Original file line number Diff line number Diff line change
Expand Up @@ -3795,19 +3795,19 @@ dependencies = [

[[package]]
name = "opentelemetry"
version = "0.18.0"
version = "0.19.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "69d6c3d7288a106c0a363e4b0e8d308058d56902adefb16f4936f417ffef086e"
checksum = "5f4b8347cc26099d3aeee044065ecc3ae11469796b4d65d065a23a584ed92a6f"
dependencies = [
"opentelemetry_api",
"opentelemetry_sdk",
]

[[package]]
name = "opentelemetry-datadog"
version = "0.6.0"
version = "0.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "171770efa142d2a19455b7e985037f560b2e75461f822dd1688bfd83c14856f6"
checksum = "daf08569fbddd2149b268e2bde2bca0bab84bc19ee2efcc234f855f49a911536"
dependencies = [
"async-trait",
"futures-core",
Expand All @@ -3826,9 +3826,9 @@ dependencies = [

[[package]]
name = "opentelemetry-http"
version = "0.7.0"
version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1edc79add46364183ece1a4542592ca593e6421c60807232f5b8f7a31703825d"
checksum = "a819b71d6530c4297b49b3cae2939ab3a8cc1b9f382826a1bc29dd0ca3864906"
dependencies = [
"async-trait",
"bytes",
Expand All @@ -3839,9 +3839,9 @@ dependencies = [

[[package]]
name = "opentelemetry-jaeger"
version = "0.17.0"
version = "0.18.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e785d273968748578931e4dc3b4f5ec86b26e09d9e0d66b55adda7fce742f7a"
checksum = "08e028dc9f4f304e9320ce38c80e7cf74067415b1ad5a8750a38bae54a4d450d"
dependencies = [
"async-trait",
"futures",
Expand All @@ -3860,9 +3860,9 @@ dependencies = [

[[package]]
name = "opentelemetry-otlp"
version = "0.11.0"
version = "0.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d1c928609d087790fc936a1067bdc310ae702bdf3b090c3f281b713622c8bbde"
checksum = "8af72d59a4484654ea8eb183fea5ae4eb6a41d7ac3e3bae5f4d2a282a3a7d3ca"
dependencies = [
"async-trait",
"futures",
Expand All @@ -3880,9 +3880,9 @@ dependencies = [

[[package]]
name = "opentelemetry-prometheus"
version = "0.11.0"
version = "0.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "06c3d833835a53cf91331d2cfb27e9121f5a95261f31f08a1f79ab31688b8da8"
checksum = "9a9f186f6293ebb693caddd0595e66b74a6068fa51048e26e0bf9c95478c639c"
dependencies = [
"opentelemetry",
"prometheus",
Expand All @@ -3891,32 +3891,31 @@ dependencies = [

[[package]]
name = "opentelemetry-proto"
version = "0.1.0"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d61a2f56df5574508dd86aaca016c917489e589ece4141df1b5e349af8d66c28"
checksum = "045f8eea8c0fa19f7d48e7bc3128a39c2e5c533d5c61298c548dfefc1064474c"
dependencies = [
"futures",
"futures-util",
"opentelemetry",
"prost",
"tonic 0.8.3",
"tonic-build",
]

[[package]]
name = "opentelemetry-semantic-conventions"
version = "0.10.0"
version = "0.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9b02e0230abb0ab6636d18e2ba8fa02903ea63772281340ccac18e0af3ec9eeb"
checksum = "24e33428e6bf08c6f7fcea4ddb8e358fab0fe48ab877a87c70c6ebe20f673ce5"
dependencies = [
"opentelemetry",
]

[[package]]
name = "opentelemetry-zipkin"
version = "0.16.0"
version = "0.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9bd6a5d672fe50f682d801f6737a54a633834cf8c91be419c0c9cae8ac85bf4d"
checksum = "e1fd48caee5e1db71454c95be32d1daeb6fae265321ff8f51b1efc8a50b0be80"
dependencies = [
"async-trait",
"futures-core",
Expand All @@ -3934,25 +3933,25 @@ dependencies = [

[[package]]
name = "opentelemetry_api"
version = "0.18.0"
version = "0.19.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c24f96e21e7acc813c7a8394ee94978929db2bcc46cf6b5014fc612bf7760c22"
checksum = "ed41783a5bf567688eb38372f2b7a8530f5a607a4b49d38dd7573236c23ca7e2"
dependencies = [
"fnv",
"futures-channel",
"futures-util",
"indexmap 1.9.3",
"js-sys",
"once_cell",
"pin-project-lite",
"thiserror",
"urlencoding",
]

[[package]]
name = "opentelemetry_sdk"
version = "0.18.0"
version = "0.19.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1ca41c4933371b61c2a2f214bf16931499af4ec90543604ec828f7a625c09113"
checksum = "8b3a2a91fdbfdd4d212c0dcc2ab540de2c2bcbbd90be17de7a7daf8822d010c1"
dependencies = [
"async-trait",
"crossbeam-channel",
Expand All @@ -3978,9 +3977,9 @@ checksum = "04744f49eae99ab78e0d5c0b603ab218f515ea8cfe5a456d7629ad883a3b6e7d"

[[package]]
name = "ordered-float"
version = "1.1.1"
version = "2.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3305af35278dd29f46fcdd139e0b1fbfae2153f0e5928b39b035542dd31e37b7"
checksum = "7940cf2ca942593318d07fcf2596cdca60a85c9e7fab408a5e21a4f9dcd40d87"
dependencies = [
"num-traits",
]
Expand Down Expand Up @@ -5928,14 +5927,14 @@ dependencies = [

[[package]]
name = "thrift"
version = "0.16.0"
version = "0.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "09678c4cdbb4eed72e18b7c2af1329c69825ed16fcbac62d083fc3e2b0590ff0"
checksum = "7e54bc85fc7faa8bc175c4bab5b92ba8d9a3ce893d0e9f42cc455c8ab16a9e09"
dependencies = [
"byteorder",
"integer-encoding",
"log",
"ordered-float 1.1.1",
"ordered-float 2.10.0",
"threadpool",
]

Expand Down Expand Up @@ -6400,9 +6399,9 @@ dependencies = [

[[package]]
name = "tracing-opentelemetry"
version = "0.18.0"
version = "0.19.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "21ebb87a95ea13271332df069020513ab70bdb5637ca42d6e492dc3bbbad48de"
checksum = "00a39dcf9bfc1742fa4d6215253b33a6e474be78275884c216fc2a06267b3600"
dependencies = [
"once_cell",
"opentelemetry",
Expand Down Expand Up @@ -6536,9 +6535,9 @@ dependencies = [

[[package]]
name = "typed-builder"
version = "0.9.1"
version = "0.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a46ee5bd706ff79131be9c94e7edcb82b703c487766a114434e5790361cf08c5"
checksum = "6179333b981641242a768f30f371c9baccbfcc03749627000c500ab88bf4528b"
dependencies = [
"proc-macro2",
"quote",
Expand Down
18 changes: 9 additions & 9 deletions apollo-router/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -130,31 +130,31 @@ once_cell = "1.18.0"
# groups `^tracing` and `^opentelemetry*` dependencies together as of
# https://github.com/apollographql/router/pull/1509. A comment which exists
# there (and on `tracing` packages below) should be updated should this change.
opentelemetry = { version = "0.18.0", features = [
opentelemetry = { version = "0.19.0", features = [
"rt-tokio",
"metrics",
] }
opentelemetry-datadog = { version = "0.6.0", features = ["reqwest-client"] }
opentelemetry-http = "0.7.0"
opentelemetry-jaeger = { version = "0.17.0", features = [
opentelemetry-datadog = { version = "0.7.0", features = ["reqwest-client"] }
opentelemetry-http = "0.8.0"
opentelemetry-jaeger = { version = "0.18.0", features = [
"collector_client",
"reqwest_collector_client",
"rt-tokio",
] }
opentelemetry-otlp = { version = "0.11.0", default-features = false, features = [
opentelemetry-otlp = { version = "0.12.0", default-features = false, features = [
"grpc-tonic",
"tonic",
"tls",
"http-proto",
"metrics",
"reqwest-client",
] }
opentelemetry-semantic-conventions = "0.10.0"
opentelemetry-zipkin = { version = "0.16.0", default-features = false, features = [
opentelemetry-semantic-conventions = "0.11.0"
opentelemetry-zipkin = { version = "0.17.0", default-features = false, features = [
"reqwest-client",
"reqwest-rustls",
] }
opentelemetry-prometheus = "0.11.0"
opentelemetry-prometheus = "0.12.0"
paste = "1.0.13"
pin-project-lite = "0.2.10"
prometheus = "0.13"
Expand Down Expand Up @@ -208,7 +208,7 @@ tower-service = "0.3.2"
tracing = "0.1.37"
tracing-core = "0.1.30"
tracing-futures = { version = "0.2.5", features = ["futures-03"] }
tracing-opentelemetry = "0.18.0"
tracing-opentelemetry = "0.19.0"
tracing-subscriber = { version = "0.3.11", features = ["env-filter", "json"] }
url = { version = "2.4.0", features = ["serde"] }
urlencoding = "2.1.2"
Expand Down
11 changes: 4 additions & 7 deletions apollo-router/src/plugins/telemetry/metrics/prometheus.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,13 +83,10 @@ impl MetricsConfigurator for Config {
metrics_config: &MetricsCommon,
) -> Result<MetricsBuilder, BoxError> {
if self.enabled {
let mut controller = controllers::basic(
processors::factory(
selectors::simple::histogram(metrics_config.buckets.clone()),
aggregation::stateless_temporality_selector(),
)
.with_memory(true),
)
let mut controller = controllers::basic(processors::factory(
selectors::simple::histogram(metrics_config.buckets.clone()),
aggregation::stateless_temporality_selector(),
))
.with_resource(Resource::new(
metrics_config
.resources
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
source: apollo-router/src/plugins/telemetry/mod.rs
expression: prom_metrics
---
apollo_router_http_request_duration_seconds_count{another_test="my_default_value",error="400 Bad Request",myname="label_value",renamed_value="my_value_set",service_name="apollo-router",status="400"} 1
apollo_router_http_request_duration_seconds_count{another_test="my_default_value",my_value="2",myname="label_value",renamed_value="my_value_set",service_name="apollo-router",status="200",x_custom="coming_from_header"} 1
apollo_router_http_request_duration_seconds_count{error="INTERNAL_SERVER_ERROR",my_key="my_custom_attribute_from_context",query_from_request="query { test }",service_name="apollo-router",status="200",subgraph="my_subgraph_name",unknown_data="default_value"} 1
apollo_router_http_request_duration_seconds_count{message="cannot contact the subgraph",service_name="apollo-router",status="500",subgraph="my_subgraph_name_error",subgraph_error_extended_code="SUBREQUEST_HTTP_ERROR"} 1
apollo_router_http_request_duration_seconds_count{another_test="my_default_value",error="400 Bad Request",myname="label_value",renamed_value="my_value_set",service_name="apollo-router",status="400",otel_scope_name="apollo/router",otel_scope_version=""} 1
apollo_router_http_request_duration_seconds_count{another_test="my_default_value",my_value="2",myname="label_value",renamed_value="my_value_set",service_name="apollo-router",status="200",x_custom="coming_from_header",otel_scope_name="apollo/router",otel_scope_version=""} 1
apollo_router_http_request_duration_seconds_count{error="INTERNAL_SERVER_ERROR",my_key="my_custom_attribute_from_context",query_from_request="query { test }",service_name="apollo-router",status="200",subgraph="my_subgraph_name",unknown_data="default_value",otel_scope_name="apollo/router",otel_scope_version=""} 1
apollo_router_http_request_duration_seconds_count{message="cannot contact the subgraph",service_name="apollo-router",status="500",subgraph="my_subgraph_name_error",subgraph_error_extended_code="SUBREQUEST_HTTP_ERROR",otel_scope_name="apollo/router",otel_scope_version=""} 1
Loading

0 comments on commit d3f37cd

Please sign in to comment.