Filebeat 6.2.1 in Kubernetes Segfaults #6372

towolf · 2018-02-13T16:37:04Z

I was tailing the log of Filebeat because I saw that my Filebeat daemonset pods kept restarting.

This is the traceback:

2018-02-13T16:32:02.764Z	INFO	log/harvester.go:216	Harvester started for file: /var/lib/docker/containers/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51-json.log
2018-02-13T16:32:02.764Z	ERROR	log/harvester.go:243	Read line error: decoding docker JSON: json: cannot unmarshal number into Go value of type reader.dockerLog; File: %!(EXTRA string=/var/lib/docker/containers/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51-json.log)
2018-02-13T16:32:07.990Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":21760,"time":21760},"total":{"ticks":187170,"time":187172,"value":187170},"user":{"ticks":165410,"time":165412}},"info":{"ephemeral_id":"a0c1894e-0350-4d21-9f37-2df11e200e28","uptime":{"ms":11221079}},"memstats":{"gc_next":10567776,"memory_alloc":6966536,"memory_total":14985033360}},"filebeat":{"events":{"active":-6,"added":84,"done":90},"harvester":{"closed":3,"open_files":4,"running":4,"started":3}},"libbeat":{"config":{"module":{"running":1}},"output":{"events":{"acked":84,"batches":14,"total":84},"read":{"bytes":84},"write":{"bytes":13896}},"pipeline":{"clients":2,"events":{"active":1,"filtered":6,"published":78,"total":84},"queue":{"acked":84}}},"registrar":{"states":{"current":38,"update":90},"writes":18},"system":{"load":{"1":0.07,"15":1.5,"5":0.89,"norm":{"1":0.0088,"15":0.1875,"5":0.1113}}}}}}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x13dd6fa]

goroutine 49 [running]:
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*ContainerIndexer).GetIndexes(0xc420428080, 0x0, 0xc42041d710, 0xc420436960, 0xc420436960)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:161 +0x3a
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*Indexers).GetIndexes(0xc42041d710, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:76 +0x230
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).removePod(0xc4201e2070, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:188 +0x3d
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).(github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.removePod)-fm(0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x34
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.processEvent(0xc420469ed0, 0xc4201154d0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:175 +0x78
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).worker(0xc4201e2070)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x26f
created by github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.newKubernetesAnnotator
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:113 +0x9f7

For confirmed bugs, please report:

Version:

docker.elastic.co/beats/filebeat:6.2.1

Operating System:

Kubernetes 1.9.1 on amd64

Steps to Reproduce:

Not sure

The text was updated successfully, but these errors were encountered:

towolf · 2018-02-13T16:40:49Z

My config:

- apiVersion: v1
  data:
    filebeat.yml: |-
      filebeat.config:
        prospectors:
          # Mounted `filebeat-prospectors` configmap:
          path: ${path.config}/prospectors.d/*.yml
          # Reload prospectors configs as they change:
          reload.enabled: false
        modules:
          path: ${path.config}/modules.d/*.yml
          # Reload module configs as they change:
          reload.enabled: false

      fields_under_root: true

      processors:

      output.logstash:
        enabled: true
        hosts: ["loghost.kube-system.svc:5000"]
        timeout: 15

      # output.console:
      #   enabled: true
      #   pretty: true

and

- apiVersion: v1
  data:
    kubernetes.yml: |-
      - type: docker
        containers.ids:
        - "*"
        fields:
          cluster: k8s-aix01
        processors:
          - add_kubernetes_metadata:
              in_cluster: true
          - decode_json_fields:
              fields: ["message"]
              process_array: false
              max_depth: 1
              target: ""
              overwrite_keys: false

exekias · 2018-02-13T16:53:34Z

Hi @towolf,

Thank you for reporting this, definitely there is something wrong, could you share the content of /var/lib/docker/containers/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51-json.log? It looks like parsing is failing for some reason

towolf · 2018-02-13T17:05:46Z

Ah, now I think I can see where this is going.

Earlier today, we wanted to test how docker log rotation is handled,
after changing these settings:

  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }

but that was 6 hours ago, or so.

So we were spamming megabytes of base64 encoded data into the log, using something like:
dd if=/dev/urandom bs=10M count=1 | base64 > /proc/1/fd/1

Here's one of the resulting files, it's the one mentioned before the segfault:

ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51-json.log

I think this is perhaps the decode_json_fields processor choking on a magic combination of chars in the base64 spew?

I'm very fond of this processor, BTW, it allows our devs to log multi-line messages by just printing JSON to stdout.

exekias · 2018-02-13T17:25:34Z

It looks like the problem is happening in the add_kubernetes_metadata processor, I'm not quite sure if the JSON unmarshal error is related to this, do you still see this behavior after removing the broken logs?

towolf · 2018-02-13T17:42:35Z

I've cleared all old logs using this:
for i in /var/lib/docker/containers/*/*-json.log; do > $i; done
Then I restarted all filebeat pods.

Definitely during the day, my log host received quite a few messages without Kubernetes metadata attached.

I'll have to study this more.

towolf · 2018-02-13T17:44:07Z

Just to be sure, the add_kubernetes_metadata is talking to the kubernetes.default.svc, i.e. the internal apiserver endpoint, yes?

exekias · 2018-02-13T17:49:33Z

That's correct

towolf · 2018-02-14T09:55:04Z

Since yesterday, I sometimes get blocks of these:

2018-02-13T21:02:26.871Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T21:31:43.409Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T21:44:07.668Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T22:18:29.365Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T22:31:14.603Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T23:02:42.139Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T23:07:41.379Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T23:45:20.467Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T23:46:24.925Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T00:24:41.503Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T00:37:00.074Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T01:12:24.462Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T01:36:48.328Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T02:00:48.198Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T02:25:30.624Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T02:59:52.067Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T03:03:39.462Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T03:39:18.842Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T03:43:08.652Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T04:31:56.131Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T04:33:03.396Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T05:07:46.670Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T05:25:52.012Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF

towolf · 2018-02-14T10:00:23Z

And then I get message like these, which indicate it is choking on null bytes.

2018-02-14T09:54:03.963Z ERROR log/harvester.go:243 Read line error: decoding docker JSON: invalid character '\x00' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log)

I found a docker logs file that starts with roughly 620000 null bytes after json-file log rotation:

head -n 1 /var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log | xxd
...
000975c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000975d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000975e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000975f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00097600: 0000 0000 0000 007b 226c 6f67 223a 2243  .......{"log":"C
00097610: 6865 636b 696e 6720 666f 7220 6a6f 6273  hecking for jobs
00097620: 2e2e 2e20 7265 6365 6976 6564 2020 2020  ... received    
00097630: 2020 2020 2020 2020 2020 2020 2020 2020                  
00097640: 205c 7530 3031 625b 303b 6d20 206a 6f62   \u001b[0;m  job
00097650: 5c75 3030 3162 5b30 3b6d 3d32 3130 3631  \u001b[0;m=21061

Is filebeat docker prospector compatible with json-file log rotation?

However, I only found one file like that so far.

exekias · 2018-02-14T11:30:10Z

Both errors are not fatal:

kubernetes: Watching API error EOF occurs when the connection to the K8S API gets closed, for whatever reason. Beats will re-establish it when that happens. I would say your rate is high though, do you have networking issues?

About the JSON decoding error, it should be transient too, next messages should be correctly decoded. I'm researching the zeroes in the file after rotation, there are some measures we can implement to avoid this issue.

Have you seen the panic again?

towolf · 2018-02-14T12:31:31Z

Yes actually, I got 2 more segfaults on exactly the node that had the "malformed" logs. But also 1 on another one. This is on our staging cluster:

$ kubectl -n kube-system get pod -l k8s-app=filebeat
NAME             READY     STATUS    RESTARTS   AGE
filebeat-84fqs   1/1       Running   0          18h
filebeat-r8fq6   1/1       Running   1          18h
filebeat-rmk9w   1/1       Running   3          18h
filebeat-rxlp7   1/1       Running   0          18h
filebeat-tl76w   1/1       Running   0          18h
filebeat-z7rlh   1/1       Running   0          18h

The log of the crash can be reached using this:


$ kubectl -n kube-system logs filebeat-rmk9w --previous --tail 25
2018-02-14T04:30:19.307Z	INFO	log/harvester.go:216	Harvester started for file: /var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log
2018-02-14T04:30:19.321Z	ERROR	log/harvester.go:243	Read line error: decoding docker JSON: invalid character '\x00' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log)
2018-02-14T04:30:29.311Z	INFO	log/harvester.go:216	Harvester started for file: /var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log
2018-02-14T04:30:29.324Z	ERROR	log/harvester.go:243	Read line error: decoding docker JSON: invalid character '\x00' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log)
2018-02-14T04:30:39.314Z	INFO	log/harvester.go:216	Harvester started for file: /var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log
2018-02-14T04:30:39.328Z	ERROR	log/harvester.go:243	Read line error: decoding docker JSON: invalid character '\x00' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log)
2018-02-14T04:30:47.889Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":2900,"time":2900},"total":{"ticks":21040,"time":21044,"value":21040},"user":{"ticks":18140,"time":18144}},"info":{"ephemeral_id":"2de41ee7-0730-4ed4-9360-e7f70241b11f","uptime":{"ms":2880248}},"memstats":{"gc_next":7829696,"memory_alloc":5643288,"memory_total":4368108992}},"filebeat":{"events":{"added":109,"done":109},"harvester":{"closed":3,"open_files":4,"running":4,"started":3}},"libbeat":{"config":{"module":{"running":1}},"output":{"events":{"acked":103,"batches":11,"total":103},"read":{"bytes":66},"write":{"bytes":14609}},"pipeline":{"clients":2,"events":{"active":0,"filtered":6,"published":103,"total":109},"queue":{"acked":103}}},"registrar":{"states":{"current":37,"update":109},"writes":13},"system":{"load":{"1":0.21,"15":0.44,"5":0.42,"norm":{"1":0.0263,"15":0.055,"5":0.0525}}}}}}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x13dd6fa]

goroutine 28 [running]:
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*ContainerIndexer).GetIndexes(0xc4203f2a60, 0x0, 0xc4203f11d0, 0xc420158180, 0xc420158180)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:161 +0x3a
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*Indexers).GetIndexes(0xc4203f11d0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:76 +0x230
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).removePod(0xc4200a0000, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:188 +0x3d
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).(github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.removePod)-fm(0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x34
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.processEvent(0xc4202e7ed0, 0xc4204bb920)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:175 +0x78
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).worker(0xc4200a0000)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x26f
created by github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.newKubernetesAnnotator
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:113 +0x9f7

We do not have networking issues as far as I'm aware ...

towolf · 2018-02-14T12:32:49Z

And this is from the other node that had 1 restart:

kubectl -n kube-system logs filebeat-r8fq6 --previous --tail 25
2018-02-14T10:01:58.866Z	INFO	kubernetes/watcher.go:133	kubernetes: Watching API for pod events
2018-02-14T10:02:07.856Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":4460,"time":4468},"total":{"ticks":33970,"time":33980,"value":33970},"user":{"ticks":29510,"time":29512}},"info":{"ephemeral_id":"483fe03d-5e8d-4b07-ab05-bf62c71fb679","uptime":{"ms":59070199}},"memstats":{"gc_next":29848208,"memory_alloc":15164640,"memory_total":453574040}},"filebeat":{"harvester":{"open_files":5,"running":19}},"libbeat":{"config":{"module":{"running":1}},"pipeline":{"clients":2,"events":{"active":4117}}},"registrar":{"states":{"current":25}},"system":{"load":{"1":0.27,"15":0.19,"5":0.31,"norm":{"1":0.0338,"15":0.0238,"5":0.0388}}}}}}
2018-02-14T10:02:19.695Z	ERROR	pipeline/output.go:74	Failed to connect: dial tcp 10.106.82.87:5000: getsockopt: connection timed out
2018-02-14T10:02:37.855Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":4460,"time":4468},"total":{"ticks":33970,"time":33984,"value":33970},"user":{"ticks":29510,"time":29516}},"info":{"ephemeral_id":"483fe03d-5e8d-4b07-ab05-bf62c71fb679","uptime":{"ms":59100198}},"memstats":{"gc_next":29848208,"memory_alloc":15379400,"memory_total":453788800}},"filebeat":{"harvester":{"open_files":5,"running":19}},"libbeat":{"config":{"module":{"running":1}},"pipeline":{"clients":2,"events":{"active":4117,"retry":2048}}},"registrar":{"states":{"current":25}},"system":{"load":{"1":0.23,"15":0.19,"5":0.29,"norm":{"1":0.0288,"15":0.0238,"5":0.0363}}}}}}
2018-02-14T10:03:07.855Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":4470,"time":4472},"total":{"ticks":33980,"time":33988,"value":33980},"user":{"ticks":29510,"time":29516}},"info":{"ephemeral_id":"483fe03d-5e8d-4b07-ab05-bf62c71fb679","uptime":{"ms":59130198}},"memstats":{"gc_next":29848208,"memory_alloc":15563176,"memory_total":453972576}},"filebeat":{"harvester":{"open_files":5,"running":19}},"libbeat":{"config":{"module":{"running":1}},"pipeline":{"clients":2,"events":{"active":4117}}},"registrar":{"states":{"current":25}},"system":{"load":{"1":0.14,"15":0.18,"5":0.26,"norm":{"1":0.0175,"15":0.0225,"5":0.0325}}}}}}
2018-02-14T10:03:26.711Z	ERROR	pipeline/output.go:74	Failed to connect: dial tcp 10.106.82.87:5000: getsockopt: connection timed out
2018-02-14T10:03:37.856Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":4470,"time":4476},"total":{"ticks":34030,"time":34044,"value":34030},"user":{"ticks":29560,"time":29568}},"info":{"ephemeral_id":"483fe03d-5e8d-4b07-ab05-bf62c71fb679","uptime":{"ms":59160196}},"memstats":{"gc_next":29845296,"memory_alloc":14922760,"memory_total":454176392}},"filebeat":{"harvester":{"open_files":5,"running":19}},"libbeat":{"config":{"module":{"running":1}},"pipeline":{"clients":2,"events":{"active":4117,"retry":2048}}},"registrar":{"states":{"current":25}},"system":{"load":{"1":0.08,"15":0.17,"5":0.24,"norm":{"1":0.01,"15":0.0213,"5":0.03}}}}}}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x13dd6fa]

goroutine 72 [running]:
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*ContainerIndexer).GetIndexes(0xc420499f80, 0x0, 0xc4204c30e0, 0xc420060cc0, 0xc420060c00)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:161 +0x3a
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*Indexers).GetIndexes(0xc4204c30e0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:76 +0x230
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).removePod(0xc4200984d0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:188 +0x3d
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).(github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.removePod)-fm(0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x34
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.processEvent(0xc42003eed0, 0xc420e9d1d0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:175 +0x78
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).worker(0xc4200984d0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x26f
created by github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.newKubernetesAnnotator
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:113 +0x9f7

towolf · 2018-02-14T13:08:54Z

Looked through the logs, and this definitely started after upgrading filebeat from 6.1.1 to 6.2.1 on Feb 8th on two clusters. 50 Segfaults in total since that day.

wflyer · 2018-02-27T06:19:50Z

I'm also experiencing same issue.

I'm using docker.elastic.co/beats/filebeat:6.2.2 with kubernetes v1.9.2.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x13dd8ba]

goroutine 96 [running]:
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*ContainerIndexer).GetIndexes(0xc42042bfc0, 0x0, 0xc420490bd0, 0xc420483680, 0xc420483620)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:161 +0x3a
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*Indexers).GetIndexes(0xc420490bd0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:76 +0x230
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).removePod(0xc4204bc0e0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:188 +0x3d
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).(github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.removePod)-fm(0x0)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x34
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.processEvent(0xc4203e9ed0, 0xc42031d9b0)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:175 +0x78
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).worker(0xc4204bc0e0)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x26f
created by github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.newKubernetesAnnotator
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:113 +0x9f7

ruflin · 2018-02-28T08:43:42Z

@exekias I wonder if we should check here https://github.com/elastic/beats/blob/master/libbeat/processors/add_kubernetes_metadata/indexers.go#L161 for nil values and log an Error? Or should GetIndexes also support returning an error?

towolf · 2018-02-28T10:29:12Z

s234 is the node with the most container churn, as this is the node where Gitlab-Runner schedules its build pods.

This seems to be correlated with actual log contents? Or does the filebeat instance prospecting the most new log files do the most lookups in the Kubernetes API?

$ kubectl -n kube-system get pod -l k8s-app=filebeat -o wide
NAME             READY     STATUS    RESTARTS   AGE       IP             NODE
filebeat-84fqs   1/1       Running   25         14d       172.31.3.242   h504
filebeat-r8fq6   1/1       Running   20         14d       172.31.5.89    h1355
filebeat-rmk9w   1/1       Running   112        14d       172.31.4.172   s234
filebeat-rv5qv   1/1       Running   4          8d        172.31.2.25    s237
filebeat-rxlp7   1/1       Running   10         14d       172.31.0.14    s239
filebeat-z7rlh   1/1       Running   10         14d       172.31.1.10    s238
$ kubectl -n kube-system get pod -l k8s-app=filebeat -o wide
NAME             READY     STATUS    RESTARTS   AGE       IP             NODE
filebeat-68svf   1/1       Running   19         19d       172.31.8.99    s931
filebeat-9w9ws   1/1       Running   6          19d       172.31.0.9     s851
filebeat-bg744   1/1       Running   6          19d       172.31.5.11    s853
filebeat-bhds5   1/1       Running   4          19d       172.31.4.7     s918
filebeat-g4zlm   1/1       Running   4          19d       172.31.6.9     s858
filebeat-mjmzw   1/1       Running   5          19d       172.31.7.9     s932
filebeat-pns7c   1/1       Running   6          19d       172.31.2.9     s859
filebeat-rjs5m   1/1       Running   6          19d       172.31.3.7     s941
filebeat-wq2qt   1/1       Running   5          19d       172.31.9.11    s939
filebeat-zfgr4   1/1       Running   9          8d        172.31.10.16   h3255
filebeat-zqcsh   1/1       Running   21         19d       172.31.1.96    s898

exekias · 2018-02-28T18:02:05Z

Yes, I think I'm adding some defensive code + logging there to avoid this issue.

Thank you everyone for reporting this! so far I couldn't reproduce the issue so the stack traces help a lot here

exekias · 2018-02-28T18:38:47Z

@ruflin I've just opened #6487, let me know what you think. Also, master and 6.3 should not be affected as this code was refactored

While the root cause is unclear, this change adds defensive code against nil Pod processing errors. It also adds more logging to debug this further. Fixes elastic#6372

While the root cause is unclear, this change adds defensive code against nil Pod processing errors. It also adds more logging to debug this further. Fixes #6372

ruflin · 2018-03-04T04:57:21Z

@exekias Merged. Good to know the other branches are not affected. Ignore my comment in the PR then.

yawboateng · 2018-03-06T17:55:47Z

seeing the same (causes container to crash), i'm on the latest 6.2.2 build and on k8s 1.8.5:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x13dd8ba]

goroutine 58 [running]:
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*ContainerIndexer).GetIndexes(0xc4203fe7e0, 0x0, 0xc4203dcc00, 0xc4201a2120, 0xc4201a2060)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:161 +0x3a
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*Indexers).GetIndexes(0xc4203dcc00, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:76 +0x230
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).removePod(0xc4201123f0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:188 +0x3d
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).(github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.removePod)-fm(0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x34
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.processEvent(0xc420046ed0, 0xc420413e60)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:175 +0x78
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).worker(0xc4201123f0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x26f
created by github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.newKubernetesAnnotator
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:113 +0x9f7

exekias · 2018-03-11T23:26:34Z

Closed by #6487, to be released with 6.2.3

towolf · 2018-03-12T21:41:13Z

Found another bad problem in #6536. Not sure if that may be related to this issue?

While the root cause is unclear, this change adds defensive code against nil Pod processing errors. It also adds more logging to debug this further. Fixes elastic#6372

ph added Filebeat Filebeat bug labels Feb 13, 2018

exekias mentioned this issue Feb 28, 2018

add_kubernetes_metadata: Do not process nil pod events #6487

Merged

ruflin pushed a commit that referenced this issue Mar 2, 2018

add_kubernetes_metadata: Do not process nil pod events (#6487)

87bfbd7

While the root cause is unclear, this change adds defensive code against nil Pod processing errors. It also adds more logging to debug this further. Fixes #6372

exekias closed this as completed Mar 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filebeat 6.2.1 in Kubernetes Segfaults #6372

Filebeat 6.2.1 in Kubernetes Segfaults #6372

towolf commented Feb 13, 2018

towolf commented Feb 13, 2018 •

edited

Loading

exekias commented Feb 13, 2018

towolf commented Feb 13, 2018 •

edited

Loading

exekias commented Feb 13, 2018

towolf commented Feb 13, 2018

towolf commented Feb 13, 2018

exekias commented Feb 13, 2018

towolf commented Feb 14, 2018

towolf commented Feb 14, 2018 •

edited

Loading

exekias commented Feb 14, 2018

towolf commented Feb 14, 2018 •

edited

Loading

towolf commented Feb 14, 2018 •

edited

Loading

towolf commented Feb 14, 2018 •

edited

Loading

wflyer commented Feb 27, 2018

ruflin commented Feb 28, 2018

towolf commented Feb 28, 2018 •

edited

Loading

exekias commented Feb 28, 2018

exekias commented Feb 28, 2018 •

edited

Loading

ruflin commented Mar 4, 2018

yawboateng commented Mar 6, 2018

exekias commented Mar 11, 2018

towolf commented Mar 12, 2018

Filebeat 6.2.1 in Kubernetes Segfaults #6372

Filebeat 6.2.1 in Kubernetes Segfaults #6372

Comments

towolf commented Feb 13, 2018

towolf commented Feb 13, 2018 • edited Loading

exekias commented Feb 13, 2018

towolf commented Feb 13, 2018 • edited Loading

exekias commented Feb 13, 2018

towolf commented Feb 13, 2018

towolf commented Feb 13, 2018

exekias commented Feb 13, 2018

towolf commented Feb 14, 2018

towolf commented Feb 14, 2018 • edited Loading

exekias commented Feb 14, 2018

towolf commented Feb 14, 2018 • edited Loading

towolf commented Feb 14, 2018 • edited Loading

towolf commented Feb 14, 2018 • edited Loading

wflyer commented Feb 27, 2018

ruflin commented Feb 28, 2018

towolf commented Feb 28, 2018 • edited Loading

exekias commented Feb 28, 2018

exekias commented Feb 28, 2018 • edited Loading

ruflin commented Mar 4, 2018

yawboateng commented Mar 6, 2018

exekias commented Mar 11, 2018

towolf commented Mar 12, 2018

towolf commented Feb 13, 2018 •

edited

Loading

towolf commented Feb 13, 2018 •

edited

Loading

towolf commented Feb 14, 2018 •

edited

Loading

towolf commented Feb 14, 2018 •

edited

Loading

towolf commented Feb 14, 2018 •

edited

Loading

towolf commented Feb 14, 2018 •

edited

Loading

towolf commented Feb 28, 2018 •

edited

Loading

exekias commented Feb 28, 2018 •

edited

Loading