-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datadog exporter ignores attributes #2066
Comments
related to #1162 |
This is fixed in the otel upgrade branch but we need a new otel release. |
The fix for the DD exporter is in an unreleased version of open-telemetry-datadog, but if you're building a custom binary you can apply a patch in your Cargo.toml: Lines 43 to 49 in 3bb8a67
I confirmed that the patches work. An alternative is to configure your DD agent to listen for OTLP traces. # datadog agent config
otlp_config:
receiver:
protocols:
grpc:
endpoint: 0.0.0.0:4317 # router config
telemetry:
tracing:
trace_config:
attributes:
version: "2"
env: "dev"
test: "42"
otlp:
endpoint: 0.0.0.0:4317
protocol: grpc |
I'll put the same comment here as I put on #1162 (comment):
(We are led to believe it's coming very soon. There's just maybe a patch in there somewhere that might be very relevant for Datadog, specifically.) |
The fix for this is purportedly is now in https://github.com/open-telemetry/opentelemetry-rust/releases/tag/v0.19.0. I'll see about getting the integration of that upgrade prioritized on our side. |
@lennyburdette do you mean to uncomment the patch.crates-io and build? Thanks |
Yeah, though this might be pretty out-of-date now so I can't guarantee that it will still work. |
We are unfortunately still blocked on the OpenTelemetry update because of upstream dependencies. It is perhaps getting close though. |
Ok, in theory, this is fixed by https://github.com/tokio-rs/tracing-opentelemetry/releases/tag/v0.19.0 and we have an umbrella ticket #2878 that tracks that need (since it unblocks many constraints — like this one — now, at least in theory.) We're proritizing making that update in the next week or two. |
I have verified that this is fixed in my opentelemetry 0.19.0 PR: #3196 |
The update requires a change to the implementation and test update as follows: - In otel 0.18.0, processor factories had a `with_memory(bool)` method which we were using when building our prometheus exporter. AFAICT, this used to be a mechanism for controlling how metrics handled stale gauges. In 0.19.0, [this method was removed](open-telemetry/opentelemetry-rust#946) and now gauges are all assumed to be as though they were created with `false`. We had been providing `true` on our call. I'm not 100% certain of the impact of this change, but it appears that we can ignore it. We may need to consider it more carefully if problems arise. - There are now two standard OTEL attributes: ```otel_scope_name="apollo/router",otel_scope_version=""``` added to output and a number of tests had to be updated to accommodate that change. - One of our tests appeared to be searching for `apollo_router_cache_hit_count` (and this was working) when it should have been searching for `apollo_router_cache_hit_count_total` (likewise for miss). I've updated the test and think this is the correct thing to do. It looks like a bug was fixed in otel and this change matches the fix. The upgrade fixes many of the outstanding issues related to opentelemetry and various APM vendors: Fixes: #2878 Fixes: #2066 Fixes: #2959 Fixes: #2225 Fixes: #1520 <!-- start metadata --> **Checklist** Complete the checklist (and note appropriate exceptions) before a final PR is raised. - [x] Changes are compatible[^1] - [x] Documentation[^2] completed - [x] Performance impact assessed and acceptable - Tests added and passing[^3] - [x] Unit Tests - [x] Integration Tests - [ ] Manual Tests **Exceptions** *Note any exceptions here* **Notes** [^1]. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]. Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]. Tick whichever testing boxes are applicable. If you are adding Manual Tests: - please document the manual testing (extensively) in the Exceptions. - please raise a separate issue to automate the test and label it (or ask for it to be labeled) as `manual test`
Reopening the issue since we have to revert the upgrade until they release a patch. See #3242 |
The update requires a change to the implementation and test update as follows: - In otel 0.18.0, processor factories had a `with_memory(bool)` method which we were using when building our prometheus exporter. AFAICT, this used to be a mechanism for controlling how metrics handled stale gauges. In 0.19.0, [this method was removed](open-telemetry/opentelemetry-rust#946) and now gauges are all assumed to be as though they were created with `false`. We had been providing `true` on our call. I'm not 100% certain of the impact of this change, but it appears that we can ignore it. We may need to consider it more carefully if problems arise. - There are now two standard OTEL attributes: ```otel_scope_name="apollo/router",otel_scope_version=""``` added to output and a number of tests had to be updated to accommodate that change. - One of our tests appeared to be searching for `apollo_router_cache_hit_count` (and this was working) when it should have been searching for `apollo_router_cache_hit_count_total` (likewise for miss). I've updated the test and think this is the correct thing to do. It looks like a bug was fixed in otel and this change matches the fix. Regarding that last point. The prometheus spec mandates naming format and the change was part of the compliance with that spec. This PR made the change: open-telemetry/opentelemetry-rust#952 The two affected counters in the router were: apollo_router_cache_hit_count -> apollo_router_cache_hit_count_total apollo_router_cache_miss_count -> apollo_router_cache_miss_count_total It's good that our prometheus metrics are now spec compliant, but we should note this in the release notes and (if possible) somewhere in our documentation. I'll add it to the changeset at least. The upgrade fixes many of the outstanding issues related to opentelemetry and various APM vendors: Fixes: #2878 Fixes: #2066 Fixes: #2959 Fixes: #2225 Fixes: #1520 <!-- start metadata --> **Checklist** Complete the checklist (and note appropriate exceptions) before a final PR is raised. - [x] Changes are compatible[^1] - [x] Documentation[^2] completed - [x] Performance impact assessed and acceptable - Tests added and passing[^3] - [x] Unit Tests - [x] Integration Tests - [ ] Manual Tests **Exceptions** *Note any exceptions here* **Notes** [^1]. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]. Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]. Tick whichever testing boxes are applicable. If you are adding Manual Tests: - please document the manual testing (extensively) in the Exceptions. - please raise a separate issue to automate the test and label it (or ask for it to be labeled) as `manual test`
Describe the bug
The Datadog exporter does not apply attributes configured in Router.yaml to spans. Our team needs to set the
version
tag at the root of all traces exported to Datadog for Deployment tracking.To Reproduce
Steps to reproduce the behavior:
Expected behavior
We expect to all traces to include a version tag at the root of each span named version.
Output
If applicable, add output to help explain your problem.
We do not see the version tag.
Desktop (please complete the following information):
Additional context
As a workaround we added a supergraph plugin that adds a span explicitly for this purpose, but it seems wasteful. It also does not help when Router receives invalid gql request that cause 400s as those short circuit any plugins we right - our version spans will not be applied.
The text was updated successfully, but these errors were encountered: