Kubernetes autodiscover doesn't discover short living jobs (and pods?) #22718

jsoriano · 2020-11-23T16:47:07Z

Reported in #11834 (comment). I could reproduce it in 7.10 with the reference documentation and a simple cronjob. Filebeat with autodiscover doesn't collect logs of short-living jobs.

What I could see is that short living containers don't generate any start event. Sometimes they generate a stop event.

With kubectl (-w) I could see that pods from short-living cronjobs don't generate an event with the running state:

hello-1606148340-spnpj   0/1     Pending     0          1s
hello-1606148340-spnpj   0/1     Pending     0          1s
hello-1606148340-spnpj   0/1     ContainerCreating   0          1s
hello-1606148340-spnpj   0/1     Completed           0          2s

With longer-living pods, this is the sequence of events seen, and logs are collected:

hello-1606148400-gm82x   0/1     Pending             0          0s
hello-1606148400-gm82x   0/1     Pending             0          0s
hello-1606148400-gm82x   0/1     ContainerCreating   0          0s
hello-1606148400-gm82x   1/1     Running             0          1s
hello-1606148400-gm82x   0/1     Completed           0          11s

I have also seen that logs from pods that print something and fail fast are not collected, events for these cases are like this:

echo   0/1     Pending   0          0s
echo   0/1     Pending   0          0s
echo   0/1     ContainerCreating   0          0s
echo   0/1     Completed           0          3s
echo   0/1     Completed           1          4s
echo   0/1     CrashLoopBackOff    1          5s
...

For these cases having the logs is important to help investigating what is happening.

If there are init containers, there can be cases where the logs for the init containers are not collected, in these cases event sequences like these ones are seen:

mytarget2                                            0/1     Init:0/1   0          6s
mytarget2                                            0/1     PodInitializing   0          15s
mytarget2                                            1/1     Running           0          19s

For Metricbeat it can be ok to don't start modules for short-living processes, but filebeat should collect logs of containers from the moment they start, it is important to investigate issues.

For confirmed bugs, please report:

Version: 7.10.0 (also reported with 7.9.3)
Discuss Forum URL: [autodiscover] Error creating runner from config: Can only start an input when all related states are finished #11834 (comment)

Steps to Reproduce:

Start filebeat with the reference configuration and an autodiscover template:

    filebeat.autodiscover:
      providers:
        - type: kubernetes
          node: ${NODE_NAME}
          templates:
            - config:
                - type: container
                  paths:
                    - /var/log/containers/*${data.kubernetes.container.id}.log

Run a cronjob with a short-living process, like this:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  failedJobsHistoryLimit: 10
  successfulJobsHistoryLimit: 20
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

With debug logs for autodiscover this is seen for some jobs, some errors regarding the lack of container.id, and some stop events, but no start event:

2020-11-23T16:44:10.377Z	DEBUG	[autodiscover]	template/config.go:156	Configuration template cannot be resolved: field 'data.kubernetes.container.id' not available in event or environment accessing 'paths' (source:'/etc/filebeat.yml')
17:44:10.377
2020-11-23T16:44:10.377Z	DEBUG	[autodiscover]	autodiscover/autodiscover.go:236	Got a stop event: map[config:[] host:10.0.6.13 id:fedc9c0a-113c-414c-95e9-409b3e56ead8 kubernetes:{"annotations":{},"labels":{"controller-uid":"d0b5a9f3-cccc-41c1-90da-7bd4decbcf8c","job-name":"hello-1606149780"},"namespace":"cron","node":{"name":"gke-jsoriano-test-default-pool-a6f338c6-w0b6"},"pod":{"name":"hello-1606149780-bngj7","uid":"fedc9c0a-113c-414c-95e9-409b3e56ead8"}} meta:{"kubernetes":{"labels":{"controller-uid":"d0b5a9f3-cccc-41c1-90da-7bd4decbcf8c","job-name":"hello-1606149780"},"namespace":"cron","node":{"name":"gke-jsoriano-test-default-pool-a6f338c6-w0b6"},"pod":{"name":"hello-1606149780-bngj7","uid":"fedc9c0a-113c-414c-95e9-409b3e56ead8"}}} ports:{} provider:d8bb0011-c4ab-4e20-890c-e7a9ff56dfff stop:true]
17:44:10.377
2020-11-23T16:44:10.377Z	DEBUG	[autodiscover]	autodiscover/autodiscover.go:236	Got a stop event: map[config:[0xc000763e90] host:10.0.6.13 id:fedc9c0a-113c-414c-95e9-409b3e56ead8.hello kubernetes:{"annotations":{},"container":{"id":"054c10b6b0c8d8530735d3a92bbff5d76f4f76420e9c33319e5a0551be0fbf87","image":"busybox","name":"hello","runtime":"docker"},"labels":{"controller-uid":"d0b5a9f3-cccc-41c1-90da-7bd4decbcf8c","job-name":"hello-1606149780"},"namespace":"cron","node":{"name":"gke-jsoriano-test-default-pool-a6f338c6-w0b6"},"pod":{"name":"hello-1606149780-bngj7","uid":"fedc9c0a-113c-414c-95e9-409b3e56ead8"}} meta:{"container":{"id":"054c10b6b0c8d8530735d3a92bbff5d76f4f76420e9c33319e5a0551be0fbf87","image":{"name":"busybox"},"runtime":"docker"},"kubernetes":{"container":{"image":"busybox","name":"hello"},"labels":{"controller-uid":"d0b5a9f3-cccc-41c1-90da-7bd4decbcf8c","job-name":"hello-1606149780"},"namespace":"cron","node":{"name":"gke-jsoriano-test-default-pool-a6f338c6-w0b6"},"pod":{"name":"hello-1606149780-bngj7","uid":"fedc9c0a-113c-414c-95e9-409b3e56ead8"}}} port:0 provider:d8bb0011-c4ab-4e20-890c-e7a9ff56dfff stop:true]

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-11-23T16:47:09Z

Pinging @elastic/integrations-platforms (Team:Platforms)

dseravalli · 2020-12-17T17:01:02Z

I can reproduce this on 7.10.1

Logs from pods that fail quickly on startup and enter a crashloop are not autodiscovered, resulting in filebeat not reading the reason from the crash from /var/log/containers.

jsoriano · 2021-03-09T10:01:05Z

Added to the description an events sequence observed by @ChrsMark when investigating similar issues with init containers.

fkalinowski · 2021-03-31T09:33:06Z

Hi,

Indeed the same problem occurs with init containers.

Below is a dumb "bash counter" to reproduce the problem with the init container. The logs of the bash-init container are never harvested as the container is terminated as soon as the Pod state changes from "Pending" to "Running"

Kind regards,
Fabien.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bash
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: bash
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app.kubernetes.io/name: bash
    spec:
      containers:
      - name: bash
        image: bash:5.1.4
        command: ['bash', "-c", 'for i in {1..500}; do echo $i && sleep 1; done']
        ports:
        - containerPort: 80
      initContainers:
      - name: bash-init
        image: bash:5.1.4
        command: ['bash', "-c", 'for i in {1..50}; do echo $i && sleep 1; done']

jsoriano added bug Filebeat Filebeat Team:Platforms Label for the Integrations - Platforms team labels Nov 23, 2020

jsoriano mentioned this issue Nov 23, 2020

[autodiscover] Error creating runner from config: Can only start an input when all related states are finished #11834

Closed

masci assigned jsoriano Mar 15, 2021

jsoriano mentioned this issue Mar 24, 2021

Refactor kubernetes autodiscover to avoid skipping short-living pods #24742

Merged

14 tasks

jsoriano closed this as completed in #24742 Apr 20, 2021

This was referenced Apr 20, 2021

Cherry-pick #24742 to 7.x: Refactor kubernetes autodiscover to avoid skipping short-living pods #25167

Merged

Kubernetes autodiscover suite elastic/e2e-testing#1064

Merged

ghost mentioned this issue Dec 14, 2022

Filebeat doesn't collect logs of CronJob pods #34045

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes autodiscover doesn't discover short living jobs (and pods?) #22718

Kubernetes autodiscover doesn't discover short living jobs (and pods?) #22718

jsoriano commented Nov 23, 2020 •

edited

Loading

elasticmachine commented Nov 23, 2020

dseravalli commented Dec 17, 2020

jsoriano commented Mar 9, 2021

fkalinowski commented Mar 31, 2021

Kubernetes autodiscover doesn't discover short living jobs (and pods?) #22718

Kubernetes autodiscover doesn't discover short living jobs (and pods?) #22718

Comments

jsoriano commented Nov 23, 2020 • edited Loading

elasticmachine commented Nov 23, 2020

dseravalli commented Dec 17, 2020

jsoriano commented Mar 9, 2021

fkalinowski commented Mar 31, 2021

jsoriano commented Nov 23, 2020 •

edited

Loading