Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Kubernetes Logs Collection Experience #25251

Closed
TylerHelmuth opened this issue Aug 14, 2023 · 21 comments
Closed

Improve Kubernetes Logs Collection Experience #25251

TylerHelmuth opened this issue Aug 14, 2023 · 21 comments
Labels
discussion needed Community discussion needed

Comments

@TylerHelmuth
Copy link
Member

Component(s)

receiver/filelog

Describe the issue you're reporting

Problem Statement

The Collector's solution for collecting logs from Kubernetes is the Filelog Receiver and it can handle collection of Kubernetes Logs for most scenarios. But Filelog Receiver was created to be a generic solution and therefore does not take advantage of useful Kubernetes assumptions out-of-the-box.

At the moment to collector logs with the Filelog receiver the recommended configuration is:

receivers:
filelog:
exclude: []
include:
- /var/log/pods/*/*/*.log
include_file_name: false
include_file_path: true
operators:
- id: get-format
  routes:
  - expr: body matches "^\\{"
    output: parser-docker
  - expr: body matches "^[^ Z]+ "
    output: parser-crio
  - expr: body matches "^[^ Z]+Z"
    output: parser-containerd
  type: router
- id: parser-crio
  regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
  timestamp:
    layout: 2006-01-02T15:04:05.999999999Z07:00
    layout_type: gotime
    parse_from: attributes.time
  type: regex_parser
- combine_field: attributes.log
  combine_with: ""
  id: crio-recombine
  is_last_entry: attributes.logtag == 'F'
  max_log_size: 102400
  output: extract_metadata_from_filepath
  source_identifier: attributes["log.file.path"]
  type: recombine
- id: parser-containerd
  regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
  timestamp:
    layout: '%Y-%m-%dT%H:%M:%S.%LZ'
    parse_from: attributes.time
  type: regex_parser
- combine_field: attributes.log
  combine_with: ""
  id: containerd-recombine
  is_last_entry: attributes.logtag == 'F'
  max_log_size: 102400
  output: extract_metadata_from_filepath
  source_identifier: attributes["log.file.path"]
  type: recombine
- id: parser-docker
  output: extract_metadata_from_filepath
  timestamp:
    layout: '%Y-%m-%dT%H:%M:%S.%LZ'
    parse_from: attributes.time
  type: json_parser
- id: extract_metadata_from_filepath
  parse_from: attributes["log.file.path"]
  regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
  type: regex_parser
- from: attributes.stream
  to: attributes["log.iostream"]
  type: move
- from: attributes.container_name
  to: resource["k8s.container.name"]
  type: move
- from: attributes.namespace
  to: resource["k8s.namespace.name"]
  type: move
- from: attributes.pod_name
  to: resource["k8s.pod.name"]
  type: move
- from: attributes.restart_count
  to: resource["k8s.container.restart_count"]
  type: move
- from: attributes.uid
  to: resource["k8s.pod.uid"]
  type: move
- from: attributes.log
  to: body
  type: move
start_at: beginning

To a new user that is a lot of scary configuration that will take time to comprehend, and they also probably don't want to comprehend it, yet it has to live in their configuration. In the Collector Helm chart we hide this complexity behind a preset, but it can't handle all situations.

Here are a couple experiences I'd like to improve:

  • "Multi-tenancy" support. The Filelog receiver is good at gathering all the logs at once and sending them down a pipeline, but it is not setup to collect logs for a specific namespace or pod and send that down a specific pipeline. To meet this requirement you must configure multiple instances of the filelogreceiver and duplicate all of the configuration, only changing the include section as needed.
    • I believe multiple instances of the receiver are needed, but it would be nice to reduce the amount of duplicate configuration. It would be nice to be able to quickly configure a k8s-specific filelogreceiver and add it to the appropriate pipeline
  • No support for label selectors. Although you can specify specific namespaces/pods/containers to collect by taking advantage of the log path, using label selectors to identify an object is a common practice in k8s.

Some of the solution might be in the helm chart and some might be in the file log receiver itself. It is also possible this spawns a new k8s-specific receiver that is using stanza behind the scenes.

Ultimately, I am looking to improve the "easy path" solution for most users in Kubernetes. I want to make it easier for user to collect logs for a specific subset of all the logs in the cluster and for it to be easier to configure multiple instances of the receiver to support different destinations. Packaging up all the Kubernetes assumptions into something like:

receivers:
  filelog:
    forKubernetes: true
    include: 
      - /var/log/pods/my-namespace/*/*.log

or

receivers:
  filelog/api-server:
    forKubernetes: true
    labelSelectors:
      - component=kube-apiserver,tier=control-plane

would be great.

@TylerHelmuth TylerHelmuth added needs triage New item requiring triage discussion needed Community discussion needed receiver/filelog and removed needs triage New item requiring triage labels Aug 14, 2023
@TylerHelmuth
Copy link
Member Author

/cc @dmitryax @djaglowski

@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

Pinging code owners for receiver/filelog: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dmitryax
Copy link
Member

There is no way to apply label selectors with filelog receiver because that information is not exposed on the file paths. The only way to do that is by fetching logs from k8s API, which adds significant load on the API and likely will significantly degrade the performance. This should not be part of filelog receiver. We have a proposal with another component for this purpose: #24439. Feel free to take a look

@TylerHelmuth
Copy link
Member Author

@dmitryax I agree, which is why I believe a possible outcome of this issue is a k8s-specific logs collection receiver. It looks like the component you linked (#23339) is exactly that and would help the 2 experiences I want to solve.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 18, 2023
@makas45
Copy link

makas45 commented Dec 21, 2023

We have also ended the same situation and would like to export the logs based on the namespace level.

@github-actions github-actions bot removed the Stale label Dec 22, 2023
@ChrsMark
Copy link
Member

@dmitryax @TylerHelmuth +1 for this!

It would be nice if we can skip/exclude Pods' logfiles based on Pod's labels/annotations.

This is super useful when a user would like to use https://opentelemetry.io/docs/concepts/sdk-configuration/general-sdk-configuration/#otel_logs_exporter to collect the logs for specific instrumented apps while for the rest of the apps/Pods filelog receiver should handle the collection.

Without the option to skip/exclude specific Pod's logfiles based on metadata we end up having duplicate records.

@ChrsMark
Copy link
Member

ChrsMark commented Jan 11, 2024

@TylerHelmuth I wonder if for this we could instead use the receiver_creator in order to populate filelog receiver configs dynamically.

Sth like that:

receivers:
  receiver_creator:
    watch_observers: [ k8s_observer ]
    receivers:
      filelog:
        rule: type == "pod" && labels["otel.logs.exporter"] == "otlp"
        config:
          ...

Note: I'm trying to make it work but without success yet but I'm posting the question already just to verify if that approach would be a no-go for any reasons.

@TylerHelmuth
Copy link
Member Author

@ChrsMark I think that is an option for handling the multi-tenency solution, but I don't think it address these issues:

  1. Still needing to configure the filelogreceiver multiple times
  2. The ability to select what pod logs to scrape based on selectors

@ChrsMark
Copy link
Member

ChrsMark commented Jan 12, 2024

@ChrsMark I think that is an option for handling the multi-tenency solution, but I don't think it address these issues:

  1. Still needing to configure the filelogreceiver multiple times

Maybe I miss sth here but k8slog receiver proposed at #23339 would need to be configured multiple times as well so as to cover all the different filter combinations, right? Or it will have more explicit support for routing to specific pipelines somehow?

  1. The ability to select what pod logs to scrape based on selectors

Won't a rule in the receiver_creator like rule: type == "pod" && labels["component"] == "kube-apiserver" be equivalent to

labelSelectors:
     - component=kube-apiserver`

?

I'm just trying to understand what extra value the k8slog receiver will bring. However it's true that defining filelog receivers as part of the receiver_creator makes the configuration more complex and maybe hard to troubleshoot so maybe a more native approach would be preferable here.

Let me know what you think @TylerHelmuth. I'm also cc-ing @h0cheung who works on the k8slog proposal.

I'm also sharing my working example for reference:

daemonset.yaml
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: daemonset
spec:
  mode: daemonset
  serviceAccount:
  hostNetwork: true
  volumeMounts:
    - name: varlogpods
      mountPath: /var/log/pods
      readOnly: true
  volumes:
    - name: varlogpods
      hostPath:
        path: /var/log/pods
  config: |
    exporters:
      debug: {}
      logging: {}
      otlp/elastic:
        compression: none
        endpoint: http://fleet-server:8200
        tls:
          insecure: true
          insecure_skip_verify: true
    extensions:
      k8s_observer:
        auth_type: serviceAccount
        node: ${env:K8S_NODE_NAME}
        observe_pods: true
      health_check: {}
      memory_ballast:
        size_in_percentage: 40
    processors:
      batch: {}
      resource/k8s:
        attributes:
          - key: service.name
            from_attribute: app.label.component
            action: insert
      k8sattributes:
        extract:
          metadata:
          - k8s.namespace.name
          - k8s.deployment.name
          - k8s.statefulset.name
          - k8s.daemonset.name
          - k8s.cronjob.name
          - k8s.job.name
          - k8s.node.name
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.pod.start_time
          - container.id
          labels:
          - tag_name: app.label.component
            key: app.kubernetes.io/component
            from: pod
          - tag_name: logs.exporter
            key: otel.logs.exporter
            from: pod
        filter:
          node_from_env_var: K8S_NODE_NAME
        passthrough: false
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.ip
        - sources:
          - from: resource_attribute
            name: k8s.pod.uid
        - sources:
          - from: connection
      memory_limiter:
        check_interval: 5s
        limit_percentage: 80
        spike_limit_percentage: 25
    receivers:
      receiver_creator:
        watch_observers: [ k8s_observer ]
        receivers:
          filelog:
            rule: type == "pod" && labels["otel.logs.exporter"] = "otlp"
            config: 
              exclude:
              - /var/log/pods/default_daemonset-opentelemetry-collector*_*/opentelemetry-collector/*.log
              include:
              - /var/log/pods/`namespace`_`name`*/*/*.log
              include_file_name: false
              include_file_path: true
              operators:
              - id: get-format
                routes:
                - expr: body matches "^{"
                  output: parser-docker
                - expr: body matches '^[^ Z]+ '
                  output: parser-crio
                - expr: body matches '^[^ Z]+Z'
                  output: parser-containerd
                type: router
              - id: parser-crio
                regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
                timestamp:
                  layout: 2006-01-02T15:04:05.999999999Z07:00
                  layout_type: gotime
                  parse_from: attributes.time
                type: regex_parser
              - combine_field: attributes.log
                combine_with: ""
                id: crio-recombine
                is_last_entry: attributes.logtag == 'F'
                max_log_size: 102400
                output: json_parser
                source_identifier: attributes['log.file.path']
                type: recombine
              - id: parser-containerd
                regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
                timestamp:
                  layout: '%Y-%m-%dT%H:%M:%S.%LZ'
                  parse_from: attributes.time
                type: regex_parser
              - combine_field: attributes.log
                combine_with: ""
                id: containerd-recombine
                is_last_entry: attributes.logtag == 'F'
                max_log_size: 102400
                output: json_parser
                source_identifier: attributes["log.file.path"]
                type: recombine
              - id: parser-docker
                output: json_parser
                timestamp:
                  layout: '%Y-%m-%dT%H:%M:%S.%LZ'
                  parse_from: attributes.time
                type: json_parser
              - type: json_parser
                if: 'body matches "^{.*}$"'
                severity:
                  parse_from: attributes.level
              start_at: end
    service:
      extensions:
      - health_check
      - k8s_observer
      pipelines:
        logs:
          exporters:
          - otlp/elastic
          processors:
          - k8sattributes
          - batch
          - resource/k8s
          receivers:
          - receiver_creator

@TylerHelmuth
Copy link
Member Author

Maybe I miss sth here but k8slog receiver proposed at #23339 would need to be configured multiple times as well so as to cover all the different filter combinations, right? Or it will have more explicit support for routing to specific pipelines somehow?

I think it would need configured multiple times as well, but it would hide all the big, long, complex fileconsumer config behind the scenes. Even with the receivercreator I believe the config in the description would need duplicated.

Won't a rule in the receiver_creator like rule: type == "pod" && labels["componeney"] == "kube-apiserver" be equivalent to

Oh I didn't realize it could do that, that's pretty cool. Yes I think that's equivalent. I expect there'd end up being some other k8s-specific options in a k8slogreceiver, but I'm curious if there would be any more overlap.

I think obfuscating the complexity of the fileconsumer configuration is the primary benefit of the k8slogreceiver. It lets users focus on k8s-stuff an not worry about how the receiver gets the logs, trusting that it knows how to take advantage of standard k8s formatting and expectations.

@ChrsMark
Copy link
Member

I think obfuscating the complexity of the fileconsumer configuration is the primary benefit of the k8slogreceiver. It lets users focus on k8s-stuff an not worry about how the receiver gets the logs, trusting that it knows how to take advantage of standard k8s formatting and expectations.

Agree @TylerHelmuth , I think a "wrapper" over the current filelog receiver which will be k8s specific hiding details would benefit the users.
I will try to have a look into the proposal PR as well and comment there if I have any questions or suggestions :).

@djaglowski
Copy link
Member

If in the end we're just looking for a way to gloss over the complexity of the filelog receiver, I think we should be looking to solve it with a "template". See open-telemetry/opentelemetry-collector#8372. That said, I believe #24439 was intended to add additional functionality which is not possible with filelog receiver.

@TylerHelmuth
Copy link
Member Author

@djaglowski ya we're looking for both things:

  1. more kubernetes-specific features for determining which logs to scrape
  2. Encapsulating common k8s-log collection configuration.

It may be possible to use a combination of templates and the receiver creator to achieve this, but I'm not sure if that would come together in a solution that is simpler than a targeted receiver.

@ChrsMark
Copy link
Member

ChrsMark commented Jan 29, 2024

Encapsulating common k8s-log collection configuration

That would be great. I guess we would need to "hide" (make it an implementation detail) the operator part that handles the docker, cri-o and containerd logs?

At the moment the routing as well as the special handling of the logs per runtime looks weird/scary to someone that is not fully familiar with the Collector's features.

Also I wonder if this functionality is somehow tested. Are there any tests that ensure that the Collector can handle docker, cri-o and containerd logs specifically? I guess

func NewKubernetesContainerWriter() *FileLogK8sWriter {
covers this.
In addition, any specific configuration details should be well documented.

We can wait and cover all those as part of the new k8slog receiver or we can handle them today as part of the Filelog receiver. Happy to discuss more my thoughts here and see if I can help in any way, so @TylerHelmuth @djaglowski let me know your thoughts on this.

@djaglowski
Copy link
Member

@djaglowski ya we're looking for both things:

  1. more kubernetes-specific features for determining which logs to scrape
  2. Encapsulating common k8s-log collection configuration.

It may be possible to use a combination of templates and the receiver creator to achieve this, but I'm not sure if that would come together in a solution that is simpler than a targeted receiver.

#23339 (comment) is still the right approach in my opinion. Basically a dedicated receiver which shares much of the same code but can additionally add k8s specific features as needed.

If there's agreement there, we might consider consolidating this issue with #23339

@djaglowski
Copy link
Member

I'm removing the filelog receiver label since there does not appear to be anything actionable in relation that that receiver (though there may be changes to shared packages).

@TylerHelmuth
Copy link
Member Author

Basically a dedicated receiver which shares much of the same code but can additionally add k8s specific features as needed.

@djaglowski agreed

@djaglowski
Copy link
Member

Closing in favor of #23339. Please continue conversation over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed Community discussion needed
Projects
None yet
Development

No branches or pull requests

5 participants