-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traces/metrics not reported after a while using OTLP exporter #3392
Comments
@Meemaw We recently (1.24.0) add support for temporality for our OTLP metrics. This is designed for use with metrics ingesters such as datadog who prefer aggregation deltas to cumulative (which is the OTEL default). It may resolve the issues you are seeing. Sample configuration fragment:
If you could try this and let us know if things improve, that would be helpful. As you note, it is difficult to debug/track, but we have found this improves reporting in the testing we have managed to perform. |
@garypen will report how it goes. There is actually 1 thing I noticed after the upgrade. |
I think that's a different problem, because I've noticed that the |
see: #3485 |
Any updates on this, @Meemaw ? 😄 |
@abernix we still see metrics/traces disappearing after a while on v1.26.0. |
@Meemaw That's disappointing. We have been using 1.26.0 with delta temporality successfully with datadog over the last couple of weeks. |
@garypen That's only relevant for metrics, right? Also not seeing traces which shouldn't be affected by that change. This is our config (in case you see anything wrong): telemetry:
metrics:
common:
service_name: "${env.DD_SERVICE:-graphql-federation}"
otlp:
endpoint: "http://${env.DD_AGENT_HOST:-datadog}:4317"
temporality: delta
tracing:
trace_config:
service_name: "${env.DD_SERVICE:-graphql-federation}"
service_namespace: "${env.DD_ENV:-development}"
sampler: "${env.DD_TRACE_SAMPLE_RATE:-1}"
parent_based_sampler: true
attributes:
version: "${env.DD_VERSION:-development}"
otlp:
endpoint: "http://${env.DD_AGENT_HOST:-datadog}:4317" We have some other services which are using the otlp grpc endpoint and they work without issues. |
@Meemaw It is only relevant for metrics, but you wrote: "we still see metrics/traces disappearing after a while on v1.26.0." so I was commenting on the metrics part of that. I should probably have made that clear. I can't see anything wrong with your config. Just out of interest, are any of your other functional services written in |
No, others are in Go. |
@garypen another observation. Metrics emitted by us (in a custom rust plugin) do not disappear. |
This is blocked until #3601 is done, so track that one first if you're curious about progress. ;) |
One other observation. I have noted that if there is no activity, for whatever reason, around a particular metric for a "while", then our datadog widget just stops reporting data. It's as though it is waiting for more data to arrive before it resumes graphing. Could this be part of the problem you are seeing @Meemaw ? i.e.: rather than metrics that you've previously seen disappearing, what you are seeing is that metrics suddenly stop being updated and then, maybe, later they are updated. |
By no activity you mean router having no traffic and metrics not being emitted? We have constant high rps traffic, so this would not be the case. |
Describe the bug
After a while traces & some metrics stop being reported by the router using OTLP exporter ~ Datadog. The timing here varies, but is usually a few hours. Traces are always missing when this happens, while some metrics are still reported while others are not.
Example of metrics that are still reported:
Example of metrics that dissapear:
This happens on latest version, but has been happening for a long time (half a year at least). I suspect this is a bug in the router, because restarting the deployment always fixes the issue.
Its hard to reproduce this locally obviously, so this is more for tracking and getting information if anyone else is experiencing similar issues.
The text was updated successfully, but these errors were encountered: