-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTel Collector potential memory leak #802
Comments
Can you also attach Collector logs? Do you have both Jaeger and Zipkin backends up and running and accepting from Collector? |
Hi Tigran, I have both Jaeger and Zipkin backends running in the separate containers(see attached picture). I am seeing the following errors in the logs. I guess the root cause could be when the backend(zipkin server) failed to handle the post request, Collector failed to process the data and kept buffering the data in the collector size and get OOM finally?
|
I did a bit of investigation on this and as far as I can tell this seems to be working as intended (no memory leak), although there are definitely things that could be improved. If I configured the Zipkin and Jaeger back ends correctly I was able to push through 1000s of traces per second with minimal memory usage, but if I didn't configure a Zipkin or Jaeger back end (or misconfigured the exporter), then the unsent traces would backup and consume a lot of memory (but not unbounded). The minimal needed to replicate this is to start the following containers (as per examples/demo):
^ Configure the emitter to push a large number of traces traces:
receivers: [jaeger]
exporters: [jaeger] # or zipkin
processors: [queued_retry] The Note we use Jaeger's bounded queue internally. They recently added a That's something we might want to consider for the future. For now, it might be a good idea to reduce the default queue size somewhat (maybe 2000 is more reasonable, although I'm not sure how large batches we would reasonably expect from various receivers?). Not completely relevant, but its also worth noting this regarding how items are dropped if the bounded queue reaches its limit: jaegertracing/jaeger#1947 |
Closing based on comment from @james-bebbington |
* Update Tracer API with instrumentation version Add option to the `Provider.Tracer` method to specify the instrumentation version. Update the global, noop, opentracing bridge, and default SDK implementations. This does not propagate the instrumentation library version to the exported span. That is left for a follow-on PR. * Revert trace_test.go This is for the next PR. * Update SDK to include version for default instrumentation If the instrumentation library name is empty and the default instrumentation is uses, include the SDK version. * Update comments and documentation * Remove default instrumentation version
* Added dns config param * Updated examples * Updated patch version * Fixed examples * Updated examples * Fixed examples and validation in notes * Rebased and updated examples * Updated examples with new chart version
I have the collector setup with the following pipeline config and I am sending about 500 spans requests per second to Jaeger Receiver in the collector. The collector heap size starts growing after running for 10 - 20 minutes and the collector instance will crash at the end. From the profiling, I can see the span data buffered in
jaeger.jSpansToOCProtoSpans(jspans []*jaeger.Span)
. I also tried to use Zipkin Receiver and has the same problem that the data buffered inzipkinreceiver.zipkinTagsToTraceAttributes
.The text was updated successfully, but these errors were encountered: