Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak on system with 128 x86_64 cores #36574

Open
jcpunk opened this issue Nov 26, 2024 · 15 comments
Open

memory leak on system with 128 x86_64 cores #36574

jcpunk opened this issue Nov 26, 2024 · 15 comments
Labels
bug Something isn't working comp:prometheus Prometheus related issues exporter/prometheus help wanted Extra attention is needed receiver/prometheus Prometheus receiver

Comments

@jcpunk
Copy link

jcpunk commented Nov 26, 2024

Describe the bug
I've got an x86_64 system with 128 cores. The otel collector adds about 5Mib to its working memory every time it scrapes a metrics endpoint. Eventually it hits up against the memorylimiter but the garbage collection never seems to really make headway and eventually fails to reclaim enough memory.

My identically configured systems with 8 or 16 x86_64 cores do not appear to leak in this manner.
My aarch64 system with a similar config and with 64 cores does also appear to leak in this manner.

Steps to reproduce
Run the otel-collector on a system with a lot of processing cores

What did you expect to see?
Memory usage eventually stabilize

What did you see instead?
Memory usage grows to fill space allotted - tested up to 4Gib (take 6 days)

What version did you use?
otelcol-contrib version 0.114.0 (memory code is probably in the base collector)

What config did you use?

---
processors:
  batch: {}
  transform/hostname:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["nodename"], "host.fnal.gov")
      - set(resource.attributes["nodename"], "host.fnal.gov")
  memory_limiter:
    check_interval: 30s
    limit_mib: 384
exporters:
  prometheus:
    endpoint: "[::]:9299"
    enable_open_metrics: true
    metric_expiration: 2m
service:
  telemetry:
    metrics:
      level: none
  pipelines:
    metrics:
      receivers:
      - prometheus
      processors:
      - memory_limiter
      - transform/hostname
      - batch
      exporters:
      - prometheus
receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: node-exporter
        scrape_interval: 45s
        static_configs:
        - targets:
          - localhost:9100
          labels:
            instance: host.fnal.gov:9100
      - job_name: systemd-exporter
        scrape_interval: 45s
        static_configs:
        - targets:
          - localhost:9558
          labels:
            instance: host.fnal.gov:9558

Environment
OS: Almalinux 9
Platform: podman
Podman Quadlet file: /etc/containers/systemd/otel-collector.container

# THIS FILE IS MANAGED BY PUPPET
[Service]
TimeoutStartSec=900
TimeoutStopSec=30
TasksMax=4096
CPUWeight=30
MemoryMax=512M
IOSchedulingClass=best-effort
IOSchedulingPriority=7
IOWeight=30
Restart=always

[Container]
AutoUpdate=registry
DropCapability=ALL
User=5219
Group=8247
HostName=%H
LogDriver=journald
NoNewPrivileges=true
Pull=missing
ReadOnly=true
PodmanArgs=--stop-signal=SIGKILL
Volume=/etc/otel-collector:/etc/otel-collector:ro,rslave,z
Environment=GOMAXPROCS=4
Environment=GOMEMLIMIT=384MiB
Exec=--config /etc/otel-collector/otel-config.yaml

Image=ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:latest
Network=host
PublishPort=[::]:9299:9299

[Install]
WantedBy=default.target

Additional context
endpoints:

[root@host ~]#  curl -s localhost:9558/metrics |wc -l
5380
[root@host ~]#  curl -s localhost:9100/metrics |wc -l
6317

logs

Nov 26 09:21:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:21:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:21:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:21:44.643Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:22:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:22:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:22:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:22:44.635Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:23:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:23:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:23:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:23:14.638Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:24:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:24:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:14.618Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:24:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:24:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:44.625Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:25:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:25:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:25:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:25:44.624Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:26:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:26:14.557Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:26:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:26:14.628Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:27:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:27:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:14.634Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 287}
Nov 26 09:27:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:44.557Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 326}
Nov 26 09:27:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:44.634Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 287}
Nov 26 09:28:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:28:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 326}
Nov 26 09:28:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:28:44.636Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:29:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:29:14.557Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:29:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:29:14.637Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:30:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:30:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:14.639Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:30:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:30:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:44.630Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:31:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:31:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:31:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:31:44.640Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:32:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:32:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:32:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:32:14.640Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 292}
Nov 26 09:33:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 332}
Nov 26 09:33:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:14.641Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 294}
Nov 26 09:33:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 334}
Nov 26 09:33:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:44.642Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 296}
Nov 26 09:34:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:34:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 336}
Nov 26 09:34:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:34:44.642Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 299}
Nov 26 09:35:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:35:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 338}
Nov 26 09:35:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:35:14.637Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 301}
Nov 26 09:36:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:14.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 340}
Nov 26 09:36:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:14.636Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 306}
Nov 26 09:36:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:44.556Z        info        [email protected]/memorylimiter.go:203        Memory usage is above soft limit. Forcing a GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 345}
Nov 26 09:36:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:44.645Z        info        [email protected]/memorylimiter.go:173        Memory usage after GC.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 308}
Nov 26 09:36:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:44.645Z        warn        [email protected]/memorylimiter.go:210        Memory usage is above soft limit. Refusing data.        {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 308}
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:37:18.677Z        error        scrape/scrape.go:1298        Scrape commit failed        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "node-exporter", "target": "http://localhost:9100/metrics", "error": "data refused due to high>
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1253
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:37:23.687Z        error        scrape/scrape.go:1298        Scrape commit failed        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "systemd-exporter", "target": "http://localhost:9558/metrics", "error": "data refused due to h>
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1253
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:38:03.673Z        error        scrape/scrape.go:1298        Scrape commit failed        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "node-exporter", "target": "http://localhost:9100/metrics", "error": "data refused due to high>
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1253
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:38:08.670Z        error        scrape/scrape.go:1298        Scrape commit failed        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "systemd-exporter", "target": "http://localhost:9558/metrics", "error": "data refused due to h>
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]:         github.com/prometheus/[email protected]/scrape/scrape.go:1253
@jcpunk jcpunk added the bug Something isn't working label Nov 26, 2024
@jcpunk
Copy link
Author

jcpunk commented Nov 27, 2024

When transforms are removed (#36351) the leak is drastically reduced - but not resolved.

@mx-psi mx-psi transferred this issue from open-telemetry/opentelemetry-collector Nov 27, 2024
@mx-psi mx-psi added the processor/transform Transform processor label Nov 27, 2024
Copy link
Contributor

Pinging code owners for processor/transform: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

@VihasMakwana
Copy link
Contributor

@jcpunk if possible, can you enable pprofextension and attach the memory profile?

@jcpunk
Copy link
Author

jcpunk commented Nov 27, 2024

Is there a particular profile you'd like me to extract? I'm not super familiar with go and pprof and there seem to be a lot of possible urls....

@VihasMakwana
Copy link
Contributor

You can refer to README and enable the extension.

Then follow these steps:

  1. Do curl http://localhost:1777/debug/pprof/heap > heap.0.pprof
  2. Wait for some time (5-10 seconds)
  3. Do curl http://localhost:1777/debug/pprof/heap > heap.1.pprof
  4. Repeat steps 1-3 (change the file name as well)

Collect a few heap profiles and attach them.

@jcpunk
Copy link
Author

jcpunk commented Nov 29, 2024

pprof files attached here
issue-36574.tar.gz

@jcpunk
Copy link
Author

jcpunk commented Dec 3, 2024

Interestingly, I've confirmed the leak persists without the transform in the pipeline.

Config:

---
extensions:
  pprof: {}
processors:
  batch: {}
  memory_limiter:
    check_interval: 30s
    limit_mib: 800
exporters:
  prometheus:
    endpoint: "[::]:9299"
    enable_open_metrics: true
    metric_expiration: 2m
    const_labels:
      nodename: host.fnal.gov
service:
  extensions:
    - pprof
  telemetry:
    metrics:
      level: none
  pipelines:
    metrics:
      receivers:
      - prometheus
      processors:
      - memory_limiter
      - batch
      exporters:
      - prometheus
receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: node-exporter
        scrape_interval: 45s
        static_configs:
        - targets:
          - localhost:9100
          labels:
            instance: host.fnal.gov:9100
        metric_relabel_configs:
        - action: labeldrop
          regex: nodename
      - job_name: systemd-exporter
        scrape_interval: 45s
        static_configs:
        - targets:
          - localhost:9558
          labels:
            instance: host.fnal.gov:9558
        metric_relabel_configs:
        - action: labeldrop
          regex: nodename

And some pprof's : pprof-notransform.tar.gz

@TylerHelmuth
Copy link
Member

This is the second issue that mentions OTTL having a memory leak. @jcpunk like in that issue, can you post screenshots of the profile indicating where in the transformprocessor/ottl the issue is happening? It would be really helpful if someone can provide a reproducible test case locally.

Copy link
Contributor

github-actions bot commented Dec 3, 2024

Pinging code owners for pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

@TylerHelmuth
Copy link
Member

Interestingly, I've confirmed the leak persists without the transform in the pipeline.

I misread this comment, which seems to exonerate OTTL

@TylerHelmuth TylerHelmuth removed processor/transform Transform processor pkg/ottl labels Dec 3, 2024
@TylerHelmuth
Copy link
Member

My guess is that there is something with the prometheus components to blame.

@TylerHelmuth TylerHelmuth added comp:prometheus Prometheus related issues receiver/prometheus Prometheus receiver exporter/prometheus labels Dec 3, 2024
@jcpunk
Copy link
Author

jcpunk commented Dec 3, 2024

can you post screenshots of the profile indicating where in the transformprocessor/ottl the issue is happening?

I'll confess I'm not sure how to parse out the attached pprof files. I was able to load them into https://pprof.me/ but I'd think an expert would be better served by the raw files rather than my cropped screenshots.

Copy link
Contributor

github-actions bot commented Dec 3, 2024

Pinging code owners for receiver/prometheus: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

Copy link
Contributor

github-actions bot commented Dec 3, 2024

Pinging code owners for exporter/prometheus: @Aneurysm9 @dashpole @ArthurSens. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

@jcpunk
Copy link
Author

jcpunk commented Dec 3, 2024

I've also captured some pprof files of the process as the memory is growing with just the otel-core distribution.

pprof-core.tar.gz

A large number of cores seems to be critical to replication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working comp:prometheus Prometheus related issues exporter/prometheus help wanted Extra attention is needed receiver/prometheus Prometheus receiver
Projects
None yet
Development

No branches or pull requests

4 participants