memory leak on system with 128 x86_64 cores #36574

jcpunk · 2024-11-26T20:07:24Z

Describe the bug
I've got an x86_64 system with 128 cores. The otel collector adds about 5Mib to its working memory every time it scrapes a metrics endpoint. Eventually it hits up against the memorylimiter but the garbage collection never seems to really make headway and eventually fails to reclaim enough memory.

My identically configured systems with 8 or 16 x86_64 cores do not appear to leak in this manner.
My aarch64 system with a similar config and with 64 cores does also appear to leak in this manner.

Steps to reproduce
Run the otel-collector on a system with a lot of processing cores

What did you expect to see?
Memory usage eventually stabilize

What did you see instead?
Memory usage grows to fill space allotted - tested up to 4Gib (take 6 days)

What version did you use?
otelcol-contrib version 0.114.0 (memory code is probably in the base collector)

What config did you use?

---
processors:
  batch: {}
  transform/hostname:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["nodename"], "host.fnal.gov")
      - set(resource.attributes["nodename"], "host.fnal.gov")
  memory_limiter:
    check_interval: 30s
    limit_mib: 384
exporters:
  prometheus:
    endpoint: "[::]:9299"
    enable_open_metrics: true
    metric_expiration: 2m
service:
  telemetry:
    metrics:
      level: none
  pipelines:
    metrics:
      receivers:
      - prometheus
      processors:
      - memory_limiter
      - transform/hostname
      - batch
      exporters:
      - prometheus
receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: node-exporter
        scrape_interval: 45s
        static_configs:
        - targets:
          - localhost:9100
          labels:
            instance: host.fnal.gov:9100
      - job_name: systemd-exporter
        scrape_interval: 45s
        static_configs:
        - targets:
          - localhost:9558
          labels:
            instance: host.fnal.gov:9558

Environment
OS: Almalinux 9
Platform: podman
Podman Quadlet file: /etc/containers/systemd/otel-collector.container

# THIS FILE IS MANAGED BY PUPPET
[Service]
TimeoutStartSec=900
TimeoutStopSec=30
TasksMax=4096
CPUWeight=30
MemoryMax=512M
IOSchedulingClass=best-effort
IOSchedulingPriority=7
IOWeight=30
Restart=always

[Container]
AutoUpdate=registry
DropCapability=ALL
User=5219
Group=8247
HostName=%H
LogDriver=journald
NoNewPrivileges=true
Pull=missing
ReadOnly=true
PodmanArgs=--stop-signal=SIGKILL
Volume=/etc/otel-collector:/etc/otel-collector:ro,rslave,z
Environment=GOMAXPROCS=4
Environment=GOMEMLIMIT=384MiB
Exec=--config /etc/otel-collector/otel-config.yaml

Image=ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:latest
Network=host
PublishPort=[::]:9299:9299

[Install]
WantedBy=default.target

Additional context
endpoints:

[root@host ~]#  curl -s localhost:9558/metrics |wc -l
5380
[root@host ~]#  curl -s localhost:9100/metrics |wc -l
6317

logs

Nov 26 09:21:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:21:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:21:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:21:44.643Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:22:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:22:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:22:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:22:44.635Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:23:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:23:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:23:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:23:14.638Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:24:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:24:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:14.618Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:24:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:24:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:44.625Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:25:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:25:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:25:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:25:44.624Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:26:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:26:14.557Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:26:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:26:14.628Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:27:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:27:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:14.634Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 287}
Nov 26 09:27:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:44.557Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 326}
Nov 26 09:27:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:44.634Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 287}
Nov 26 09:28:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:28:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 326}
Nov 26 09:28:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:28:44.636Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:29:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:29:14.557Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:29:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:29:14.637Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:30:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:30:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:14.639Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:30:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:30:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:44.630Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:31:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:31:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:31:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:31:44.640Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:32:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:32:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:32:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:32:14.640Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 292}
Nov 26 09:33:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 332}
Nov 26 09:33:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:14.641Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 294}
Nov 26 09:33:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 334}
Nov 26 09:33:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:44.642Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 296}
Nov 26 09:34:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:34:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 336}
Nov 26 09:34:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:34:44.642Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 299}
Nov 26 09:35:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:35:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 338}
Nov 26 09:35:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:35:14.637Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 301}
Nov 26 09:36:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 340}
Nov 26 09:36:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:14.636Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 306}
Nov 26 09:36:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 345}
Nov 26 09:36:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:44.645Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 308}
Nov 26 09:36:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:44.645Z        warn        [email protected]/memorylimiter.go:210        Memory usage is above soft limit. Refusing data.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 308}
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:37:18.677Z        error        scrape/scrape.go:1298        Scrape commit failed        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "node-exporter", "target": "http://localhost:9100/metrics", "error": "data refused due to high>
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1253
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:37:23.687Z        error        scrape/scrape.go:1298        Scrape commit failed        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "systemd-exporter", "target": "http://localhost:9558/metrics", "error": "data refused due to h>
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1253
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:38:03.673Z        error        scrape/scrape.go:1298        Scrape commit failed        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "node-exporter", "target": "http://localhost:9100/metrics", "error": "data refused due to high>
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1253
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:38:08.670Z        error        scrape/scrape.go:1298        Scrape commit failed        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "systemd-exporter", "target": "http://localhost:9558/metrics", "error": "data refused due to h>
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1253

The text was updated successfully, but these errors were encountered:

jcpunk · 2024-11-27T14:12:30Z

When transforms are removed (#36351) the leak is drastically reduced - but not resolved.

github-actions · 2024-11-27T17:12:25Z

Pinging code owners for processor/transform: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

VihasMakwana · 2024-11-27T18:34:15Z

@jcpunk if possible, can you enable pprofextension and attach the memory profile?

jcpunk · 2024-11-27T18:56:52Z

Is there a particular profile you'd like me to extract? I'm not super familiar with go and pprof and there seem to be a lot of possible urls....

VihasMakwana · 2024-11-27T19:12:14Z

You can refer to README and enable the extension.

Then follow these steps:

Do curl http://localhost:1777/debug/pprof/heap > heap.0.pprof
Wait for some time (5-10 seconds)
Do curl http://localhost:1777/debug/pprof/heap > heap.1.pprof
Repeat steps 1-3 (change the file name as well)

Collect a few heap profiles and attach them.

jcpunk · 2024-11-29T17:14:06Z

pprof files attached here
issue-36574.tar.gz

jcpunk · 2024-12-03T14:09:32Z

Interestingly, I've confirmed the leak persists without the transform in the pipeline.

Config:

---
extensions:
  pprof: {}
processors:
  batch: {}
  memory_limiter:
    check_interval: 30s
    limit_mib: 800
exporters:
  prometheus:
    endpoint: "[::]:9299"
    enable_open_metrics: true
    metric_expiration: 2m
    const_labels:
      nodename: host.fnal.gov
service:
  extensions:
    - pprof
  telemetry:
    metrics:
      level: none
  pipelines:
    metrics:
      receivers:
      - prometheus
      processors:
      - memory_limiter
      - batch
      exporters:
      - prometheus
receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: node-exporter
        scrape_interval: 45s
        static_configs:
        - targets:
          - localhost:9100
          labels:
            instance: host.fnal.gov:9100
        metric_relabel_configs:
        - action: labeldrop
          regex: nodename
      - job_name: systemd-exporter
        scrape_interval: 45s
        static_configs:
        - targets:
          - localhost:9558
          labels:
            instance: host.fnal.gov:9558
        metric_relabel_configs:
        - action: labeldrop
          regex: nodename

And some pprof's : pprof-notransform.tar.gz

TylerHelmuth · 2024-12-03T16:02:58Z

This is the second issue that mentions OTTL having a memory leak. @jcpunk like in that issue, can you post screenshots of the profile indicating where in the transformprocessor/ottl the issue is happening? It would be really helpful if someone can provide a reproducible test case locally.

github-actions · 2024-12-03T16:05:27Z

Pinging code owners for pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

TylerHelmuth · 2024-12-03T16:06:12Z

Interestingly, I've confirmed the leak persists without the transform in the pipeline.

I misread this comment, which seems to exonerate OTTL

TylerHelmuth · 2024-12-03T16:07:02Z

My guess is that there is something with the prometheus components to blame.

jcpunk · 2024-12-03T16:09:51Z

can you post screenshots of the profile indicating where in the transformprocessor/ottl the issue is happening?

I'll confess I'm not sure how to parse out the attached pprof files. I was able to load them into https://pprof.me/ but I'd think an expert would be better served by the raw files rather than my cropped screenshots.

github-actions · 2024-12-03T16:09:59Z

Pinging code owners for receiver/prometheus: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

github-actions · 2024-12-03T16:10:01Z

Pinging code owners for exporter/prometheus: @Aneurysm9 @dashpole @ArthurSens. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

jcpunk · 2024-12-03T16:51:37Z

I've also captured some pprof files of the process as the memory is growing with just the otel-core distribution.

pprof-core.tar.gz

A large number of cores seems to be critical to replication.

jcpunk added the bug Something isn't working label Nov 26, 2024

mx-psi transferred this issue from open-telemetry/opentelemetry-collector Nov 27, 2024

mx-psi added the processor/transform Transform processor label Nov 27, 2024

github-actions bot mentioned this issue Dec 3, 2024

Weekly Report: 2024-11-26 - 2024-12-03 #36628

Closed

TylerHelmuth added the pkg/ottl label Dec 3, 2024

TylerHelmuth removed processor/transform Transform processor pkg/ottl labels Dec 3, 2024

TylerHelmuth added comp:prometheus Prometheus related issues receiver/prometheus Prometheus receiver exporter/prometheus labels Dec 3, 2024

TylerHelmuth mentioned this issue Dec 3, 2024

possible memory leak: config with hostmetrics, kubeletstats, prometheus recievers + transform/k8sattributes processors #36351

Open

TylerHelmuth added the help wanted Extra attention is needed label Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory leak on system with 128 x86_64 cores #36574

memory leak on system with 128 x86_64 cores #36574

jcpunk commented Nov 26, 2024 •

edited

Loading

jcpunk commented Nov 27, 2024 •

edited

Loading

github-actions bot commented Nov 27, 2024

VihasMakwana commented Nov 27, 2024

jcpunk commented Nov 27, 2024

VihasMakwana commented Nov 27, 2024

jcpunk commented Nov 29, 2024

jcpunk commented Dec 3, 2024

TylerHelmuth commented Dec 3, 2024

github-actions bot commented Dec 3, 2024

TylerHelmuth commented Dec 3, 2024

TylerHelmuth commented Dec 3, 2024

jcpunk commented Dec 3, 2024

github-actions bot commented Dec 3, 2024

github-actions bot commented Dec 3, 2024

jcpunk commented Dec 3, 2024 •

edited

Loading

memory leak on system with 128 x86_64 cores #36574

memory leak on system with 128 x86_64 cores #36574

Comments

jcpunk commented Nov 26, 2024 • edited Loading

jcpunk commented Nov 27, 2024 • edited Loading

github-actions bot commented Nov 27, 2024

VihasMakwana commented Nov 27, 2024

jcpunk commented Nov 27, 2024

VihasMakwana commented Nov 27, 2024

jcpunk commented Nov 29, 2024

jcpunk commented Dec 3, 2024

TylerHelmuth commented Dec 3, 2024

github-actions bot commented Dec 3, 2024

TylerHelmuth commented Dec 3, 2024

TylerHelmuth commented Dec 3, 2024

jcpunk commented Dec 3, 2024

github-actions bot commented Dec 3, 2024

github-actions bot commented Dec 3, 2024

jcpunk commented Dec 3, 2024 • edited Loading

jcpunk commented Nov 26, 2024 •

edited

Loading

jcpunk commented Nov 27, 2024 •

edited

Loading

jcpunk commented Dec 3, 2024 •

edited

Loading