-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory grows fast, suspected a leak #21484
Comments
Can you please use the pprof extension and capture memory usage? Since you use the prometheus exporter, I assume that you're using the contrib distribution or a distribution you created yourself. I will move this report to the contrib repository. |
@bitomaxsp, thanks for reporting. Given that jump from 0.50.0 to v0.76.1 is pretty big, it's hard to pinpoint an issue. Would you mind helping us identify a specific version that contributed to the memory consumption the most? It'd be great if you could try a kind binary search starting from 0.63.0 and reduce the versions difference. |
Pinging code owners for exporter/prometheus: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I am facing a related issue, so thought of using this thread. When I manually curl Adding the heap dump & otel-config for reference. otel-influx_prom_pprof_heap.heap.zip Is this an expected behavior? |
This clearly seems like a bug in the Prometheus exporter. Any help would be appreciated. @tj---, can you help to figure out which version this bug was introduced? @Aneurysm9 do you have a chance to take a look into it as a code owner? |
Sure, I'll do that. Will get back in a day or two. |
I went back up to 0.43.0 (the oldest available |
@dmitryax it looks like a design choice. A colleague investigated that the expiry possibly happens only during the |
In our case we were scraping the /metrics all the time. And bug still reproducible. |
i tried 0.79. Issue is there with the same growth rate. |
(pprof) top10
Showing nodes accounting for 919441, 94.07% of 977405 total
Dropped 101 nodes (cum <= 4887)
Showing top 10 nodes out of 90
flat flat% sum% cum cum%
420527 43.02% 43.02% 420527 43.02% go.opentelemetry.io/collector/pdata/internal/data/protogen/common/v1.(*AnyValue).Unmarshal
196610 20.12% 63.14% 617137 63.14% go.opentelemetry.io/collector/pdata/internal/data/protogen/common/v1.(*KeyValue).Unmarshal
85289 8.73% 71.87% 146502 14.99% go.opentelemetry.io/collector/pdata/pmetric.MetricSlice.CopyTo
60076 6.15% 78.01% 546140 55.88% go.opentelemetry.io/collector/pdata/internal/data/protogen/metrics/v1.(*Metric).Unmarshal
52577 5.38% 83.39% 52577 5.38% go.opentelemetry.io/collector/pdata/pcommon.Map.PutEmpty
32768 3.35% 86.74% 32768 3.35% golang.org/x/net/http2/hpack.AppendHuffmanString
21851 2.24% 88.98% 21851 2.24% go.opentelemetry.io/collector/pdata/pcommon.copyFloat64Slice (inline)
21850 2.24% 91.22% 21850 2.24% go.opentelemetry.io/collector/pdata/pcommon.copyUInt64Slice (inline)
16970 1.74% 92.95% 16970 1.74% go.opentelemetry.io/collector/pdata/pmetric.NumberDataPointSlice.CopyTo
10923 1.12% 94.07% 10923 1.12% context.WithValue |
(pprof) top10 -cum
Showing nodes accounting for 0, 0% of 977405 total
Dropped 101 nodes (cum <= 4887)
Showing top 10 nodes out of 90
flat flat% sum% cum cum%
0 0% 0% 677213 69.29% github.com/golang/protobuf/proto.Unmarshal
0 0% 0% 677213 69.29% github.com/golang/protobuf/proto.UnmarshalMerge
0 0% 0% 677213 69.29% go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/metrics/v1.(*ExportMetricsServiceRequest).Unmarshal
0 0% 0% 677213 69.29% go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/metrics/v1._MetricsService_Export_Handler
0 0% 0% 677213 69.29% go.opentelemetry.io/collector/pdata/internal/data/protogen/metrics/v1.(*ResourceMetrics).Unmarshal
0 0% 0% 677213 69.29% google.golang.org/grpc.(*Server).handleStream
0 0% 0% 677213 69.29% google.golang.org/grpc.(*Server).processUnaryRPC
0 0% 0% 677213 69.29% google.golang.org/grpc.(*Server).processUnaryRPC.func2
0 0% 0% 677213 69.29% google.golang.org/grpc.(*Server).serveStreams.func1.1
0 0% 0% 677213 69.29% google.golang.org/grpc/encoding/proto.codec.Unmarshal |
(pprof) top10 -cum
Showing nodes accounting for 4.01MB, 6.31% of 63.50MB total
Showing top 10 nodes out of 191
flat flat% sum% cum cum%
0 0% 0% 36.52MB 57.51% github.com/open-telemetry/opentelemetry-collector-contrib/pkg/resourcetotelemetry.(*wrapperMetricsExporter).ConsumeMetrics
0 0% 0% 36.52MB 57.51% go.opentelemetry.io/collector/processor/batchprocessor.(*batchMetrics).export
0 0% 0% 36.52MB 57.51% go.opentelemetry.io/collector/processor/batchprocessor.(*shard).sendItems
0 0% 0% 36.52MB 57.51% go.opentelemetry.io/collector/processor/batchprocessor.(*shard).start
0 0% 0% 32.52MB 51.21% github.com/open-telemetry/opentelemetry-collector-contrib/pkg/resourcetotelemetry.convertToMetricsAttributes
0 0% 0% 19.01MB 29.94% go.opentelemetry.io/collector/pdata/pmetric.Metrics.CopyTo
0.51MB 0.8% 0.8% 19.01MB 29.94% go.opentelemetry.io/collector/pdata/pmetric.ResourceMetricsSlice.CopyTo
0 0% 0.8% 18.51MB 29.14% go.opentelemetry.io/collector/pdata/pmetric.ResourceMetrics.CopyTo
0.50MB 0.79% 1.59% 18.51MB 29.14% go.opentelemetry.io/collector/pdata/pmetric.ScopeMetricsSlice.CopyTo
3MB 4.73% 6.31% 18.01MB 28.36% go.opentelemetry.io/collector/pdata/pmetric.MetricSlice.CopyTo |
Do you run into a OOM eventually? This memory usage in absolute terms represents very small amounts of memory, 30MiB. It would be great to have snapshot 8 hours in. |
I didn't run it that long. But i can try. |
@bitomaxsp @atoulme The OT collectors have been running for many days in our systems and I observe a slow leak (Influx input and the logger is the output). I am attaching the heap dumps 4 days apart. |
This was done with |
This is because this line may be called multiple times:
This leak is specific to the carbonreceiver handling of obsreport. It should not create multiple obsreports when reading each line. |
This might be a completely different issue than the issue first reported, fwiw. |
@atoulme, did you use the config reported in the issue? The carbon exporter is not used there |
@bitomaxsp the profiles don't show anything suspicious. The one with inuse_space is taken at 63.5MB, can you please take it when the memory is goes higher? Also, did you have a chance to figure out what version introduced the issue between 0.50.0 and v0.76.1? |
I confirm that on 79 an 81 are leak free. I have been running it for ~3 week in production and it's good. |
I consider the issue solved unless there are concerns. Feel free to reopen it if needed. |
After the update to tag v0.76.1 and deployment to production we noticed memory grows up to the set limits.
Grow rate is ~18Mb/min
Steps to reproduce
I assume deploying the collector and applying metric point rate at ~350-400 should be enough.
What did you expect to see?
Expectation is that memory will grow at least with the same rate as on previously deployed version which is v0.50.0.
What did you see instead?
A clear and concise description of what you saw instead.
What version did you use?
Version: v0.76.1
What config did you use?
Config:
Environment
OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 1.20.3") compiled for amd64 arch.
K8s memory request: 250Mb
K8s memory limit: 800Mb
Memory consumption on v0.76.1 (time range 14 hours)
Memory consumption on v0.50.0 (time range ~2 hours)
The text was updated successfully, but these errors were encountered: