Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory_limitter does not work when mounting host filesystem with cgroupsv1 #6826

Closed
povilasv opened this issue Dec 19, 2022 · 2 comments
Closed
Labels
bug Something isn't working processor/memorylimiter

Comments

@povilasv
Copy link
Contributor

povilasv commented Dec 19, 2022

Describe the bug

Memory Limiter processor might incorrectly work when using cgroups v1 and mounting host's /sys filesystem.
We need to mount the host filesystem for hostmetrics processor to work. Example ->
https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_pod.tpl#L109-L113

In this case /proc/self/mountinfo will contain many memory cgroups for different containers running on the host.

This issue was reported in helm charts repository -> open-telemetry/opentelemetry-helm-charts#543

Steps to reproduce

Run OpenTelemetry Collector on Kubernetes with Docker and make the Pod mount host filesystem

What did you expect to see?

Memory limiter reading correct cgroup file and getting total memory.

What did you see instead?
Memory limiter trying to read incorrect file.

What version did you use?
Version: v0.66.0

What config did you use?
Config: (e.g. the yaml config file)

apiVersion: v1
data:
  relay: |
    exporters:
      coralogix:
        application_name: 'default'
        application_name_attributes:
        - k8s.namespace.name
        - service.namespace
        logs:
          endpoint: 'otel-logs.coralogix.com:443'
        metrics:
          endpoint: 'otel-metrics.coralogix.com:443'
        private_key: ${CORALOGIX_PRIVATE_KEY}
        subsystem_name: 'nodes'
        subsystem_name_attributes:
        - k8s.deployment.name
        - k8s.statefulset.name
        - k8s.daemonset.name
        - k8s.cronjob.name
        - k8s.job.name
        - k8s.container.name
        - k8s.node.name
        - service.name
        timeout: 30s
        traces:
          endpoint: 'otel-traces.coralogix.com:443'
      logging: {}
    extensions:
      file_storage:
        directory: /var/lib/otelcol
      health_check: {}
      memory_ballast:
        size_in_percentage: 40
      zpages:
        endpoint: localhost:55679
    processors:
      batch: {}
      k8sattributes:
        extract:
          metadata:
          - k8s.namespace.name
          - k8s.deployment.name
          - k8s.statefulset.name
          - k8s.daemonset.name
          - k8s.cronjob.name
          - k8s.job.name
        passthrough: false
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.ip
        - sources:
          - from: resource_attribute
            name: k8s.pod.uid
        - sources:
          - from: connection
      memory_limiter:
        check_interval: 5s
        limit_percentage: 80
        spike_limit_percentage: 25
      resourcedetection/env:
        detectors:
        - system
        - env
        override: false
        timeout: 2s
      spanmetrics:
        dimensions:
        - name: k8s.deployment.name
        - name: k8s.statefulset.name
        - name: k8s.daemonset.name
        - name: k8s.cronjob.name
        - name: k8s.job.name
        - name: k8s.container.name
        - name: k8s.node.name
        - name: k8s.namespace.name
        metrics_exporter: coralogix
    receivers:
      filelog:
        exclude:
        - /var/log/pods/sys-mon_otel-coralogix*_*/opentelemetry-collector/*.log
        include:
        - /var/log/pods/*/*/*.log
        include_file_name: false
        include_file_path: true
        operators:
        - id: get-format
          routes:
          - expr: body matches "^\\{"
            output: parser-docker
          - expr: body matches "^[^ Z]+ "
            output: parser-crio
          - expr: body matches "^[^ Z]+Z"
            output: parser-containerd
          type: router
        - id: parser-crio
          output: extract_metadata_from_filepath
          regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
          timestamp:
            layout: "2006-01-02T15:04:05.000000000-07:00"
            layout_type: gotime
            parse_from: attributes.time
          type: regex_parser
        - id: parser-containerd
          output: extract_metadata_from_filepath
          regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
          timestamp:
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
            parse_from: attributes.time
          type: regex_parser
        - id: parser-docker
          output: extract_metadata_from_filepath
          timestamp:
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
            parse_from: attributes.time
          type: json_parser
        - id: extract_metadata_from_filepath
          parse_from: attributes["log.file.path"]
          regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
          type: regex_parser
        - from: attributes.stream
          to: attributes["log.iostream"]
          type: move
        - from: attributes.container_name
          to: resource["k8s.container.name"]
          type: move
        - from: attributes.namespace
          to: resource["k8s.namespace.name"]
          type: move
        - from: attributes.pod_name
          to: resource["k8s.pod.name"]
          type: move
        - from: attributes.restart_count
          to: resource["k8s.container.restart_count"]
          type: move
        - from: attributes.uid
          to: resource["k8s.pod.uid"]
          type: move
        - from: attributes.log
          to: body
          type: move
        start_at: beginning
        storage: file_storage
      hostmetrics:
        collection_interval: 10s
        root_path: /hostfs
        scrapers:
          cpu: null
          disk: null
          filesystem:
            exclude_fs_types:
              fs_types:
              - autofs
              - binfmt_misc
              - bpf
              - cgroup2
              - configfs
              - debugfs
              - devpts
              - devtmpfs
              - fusectl
              - hugetlbfs
              - iso9660
              - mqueue
              - nsfs
              - overlay
              - proc
              - procfs
              - pstore
              - rpc_pipefs
              - securityfs
              - selinuxfs
              - squashfs
              - sysfs
              - tracefs
              match_type: strict
            exclude_mount_points:
              match_type: regexp
              mount_points:
              - /dev/*
              - /proc/*
              - /sys/*
              - /run/k3s/containerd/*
              - /var/lib/docker/*
              - /var/lib/kubelet/*
              - /snap/*
          load: null
          memory: null
          network: null
      jaeger:
        protocols:
          grpc:
            endpoint: ${MY_POD_IP}:14250
          thrift_binary:
            endpoint: ${MY_POD_IP}:6832
          thrift_compact:
            endpoint: ${MY_POD_IP}:6831
          thrift_http:
            endpoint: ${MY_POD_IP}:14268
      kubeletstats:
        auth_type: serviceAccount
        collection_interval: 20s
        endpoint: ${K8S_NODE_NAME}:10250
      otlp:
        protocols:
          grpc:
            endpoint: ${MY_POD_IP}:4317
          http:
            endpoint: ${MY_POD_IP}:4318
      prometheus:
        config:
          scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 30s
            static_configs:
            - targets:
              - ${MY_POD_IP}:8888
      zipkin:
        endpoint: ${MY_POD_IP}:9411
    service:
      extensions:
      - zpages
      - health_check
      - memory_ballast
      - file_storage
      pipelines:
        logs:
          exporters:
          - coralogix
          processors:
          - k8sattributes
          - batch
          receivers:
          - otlp
          - filelog
        metrics:
          exporters:
          - coralogix
          processors:
          - k8sattributes
          - memory_limiter
          - resourcedetection/env
          - batch
          receivers:
          - prometheus
          - otlp
          - hostmetrics
          - kubeletstats
        traces:
          exporters:
          - coralogix
          processors:
          - k8sattributes
          - memory_limiter
          - spanmetrics
          - batch
          receivers:
          - otlp
          - zipkin
          - jaeger
      telemetry:
        metrics:
          address: ${MY_POD_IP}:8888

Environment

Additional context
Add any other context about the problem here.

@atoulme
Copy link
Contributor

atoulme commented Dec 14, 2023

Is there more work required here?

@povilasv
Copy link
Contributor Author

forgot to close issue, this is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working processor/memorylimiter
Projects
None yet
Development

No branches or pull requests

3 participants