Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat 6.2.1 in Kubernetes Segfaults #6372

Closed
towolf opened this issue Feb 13, 2018 · 22 comments
Closed

Filebeat 6.2.1 in Kubernetes Segfaults #6372

towolf opened this issue Feb 13, 2018 · 22 comments
Labels

Comments

@towolf
Copy link

towolf commented Feb 13, 2018

I was tailing the log of Filebeat because I saw that my Filebeat daemonset pods kept restarting.

This is the traceback:

2018-02-13T16:32:02.764Z	INFO	log/harvester.go:216	Harvester started for file: /var/lib/docker/containers/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51-json.log
2018-02-13T16:32:02.764Z	ERROR	log/harvester.go:243	Read line error: decoding docker JSON: json: cannot unmarshal number into Go value of type reader.dockerLog; File: %!(EXTRA string=/var/lib/docker/containers/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51-json.log)
2018-02-13T16:32:07.990Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":21760,"time":21760},"total":{"ticks":187170,"time":187172,"value":187170},"user":{"ticks":165410,"time":165412}},"info":{"ephemeral_id":"a0c1894e-0350-4d21-9f37-2df11e200e28","uptime":{"ms":11221079}},"memstats":{"gc_next":10567776,"memory_alloc":6966536,"memory_total":14985033360}},"filebeat":{"events":{"active":-6,"added":84,"done":90},"harvester":{"closed":3,"open_files":4,"running":4,"started":3}},"libbeat":{"config":{"module":{"running":1}},"output":{"events":{"acked":84,"batches":14,"total":84},"read":{"bytes":84},"write":{"bytes":13896}},"pipeline":{"clients":2,"events":{"active":1,"filtered":6,"published":78,"total":84},"queue":{"acked":84}}},"registrar":{"states":{"current":38,"update":90},"writes":18},"system":{"load":{"1":0.07,"15":1.5,"5":0.89,"norm":{"1":0.0088,"15":0.1875,"5":0.1113}}}}}}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x13dd6fa]

goroutine 49 [running]:
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*ContainerIndexer).GetIndexes(0xc420428080, 0x0, 0xc42041d710, 0xc420436960, 0xc420436960)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:161 +0x3a
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*Indexers).GetIndexes(0xc42041d710, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:76 +0x230
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).removePod(0xc4201e2070, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:188 +0x3d
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).(github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.removePod)-fm(0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x34
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.processEvent(0xc420469ed0, 0xc4201154d0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:175 +0x78
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).worker(0xc4201e2070)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x26f
created by github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.newKubernetesAnnotator
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:113 +0x9f7

For confirmed bugs, please report:

  • Version:

docker.elastic.co/beats/filebeat:6.2.1

  • Operating System:

Kubernetes 1.9.1 on amd64

  • Steps to Reproduce:

Not sure

@towolf
Copy link
Author

towolf commented Feb 13, 2018

My config:

- apiVersion: v1
  data:
    filebeat.yml: |-
      filebeat.config:
        prospectors:
          # Mounted `filebeat-prospectors` configmap:
          path: ${path.config}/prospectors.d/*.yml
          # Reload prospectors configs as they change:
          reload.enabled: false
        modules:
          path: ${path.config}/modules.d/*.yml
          # Reload module configs as they change:
          reload.enabled: false

      fields_under_root: true

      processors:

      output.logstash:
        enabled: true
        hosts: ["loghost.kube-system.svc:5000"]
        timeout: 15

      # output.console:
      #   enabled: true
      #   pretty: true

and

- apiVersion: v1
  data:
    kubernetes.yml: |-
      - type: docker
        containers.ids:
        - "*"
        fields:
          cluster: k8s-aix01
        processors:
          - add_kubernetes_metadata:
              in_cluster: true
          - decode_json_fields:
              fields: ["message"]
              process_array: false
              max_depth: 1
              target: ""
              overwrite_keys: false

@ph ph added Filebeat Filebeat bug labels Feb 13, 2018
@exekias
Copy link
Contributor

exekias commented Feb 13, 2018

Hi @towolf,

Thank you for reporting this, definitely there is something wrong, could you share the content of /var/lib/docker/containers/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51/ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51-json.log? It looks like parsing is failing for some reason

@towolf
Copy link
Author

towolf commented Feb 13, 2018

Ah, now I think I can see where this is going.

Earlier today, we wanted to test how docker log rotation is handled,
after changing these settings:

  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }

but that was 6 hours ago, or so.

So we were spamming megabytes of base64 encoded data into the log, using something like:
dd if=/dev/urandom bs=10M count=1 | base64 > /proc/1/fd/1

Here's one of the resulting files, it's the one mentioned before the segfault:

ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51-json.log

I think this is perhaps the decode_json_fields processor choking on a magic combination of chars in the base64 spew?

I'm very fond of this processor, BTW, it allows our devs to log multi-line messages by just printing JSON to stdout.

@exekias
Copy link
Contributor

exekias commented Feb 13, 2018

It looks like the problem is happening in the add_kubernetes_metadata processor, I'm not quite sure if the JSON unmarshal error is related to this, do you still see this behavior after removing the broken logs?

@towolf
Copy link
Author

towolf commented Feb 13, 2018

I've cleared all old logs using this:
for i in /var/lib/docker/containers/*/*-json.log; do > $i; done
Then I restarted all filebeat pods.

Definitely during the day, my log host received quite a few messages without Kubernetes metadata attached.

I'll have to study this more.

@towolf
Copy link
Author

towolf commented Feb 13, 2018

Just to be sure, the add_kubernetes_metadata is talking to the kubernetes.default.svc, i.e. the internal apiserver endpoint, yes?

@exekias
Copy link
Contributor

exekias commented Feb 13, 2018

That's correct

@towolf
Copy link
Author

towolf commented Feb 14, 2018

Since yesterday, I sometimes get blocks of these:

2018-02-13T21:02:26.871Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T21:31:43.409Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T21:44:07.668Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T22:18:29.365Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T22:31:14.603Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T23:02:42.139Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T23:07:41.379Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T23:45:20.467Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-13T23:46:24.925Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T00:24:41.503Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T00:37:00.074Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T01:12:24.462Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T01:36:48.328Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T02:00:48.198Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T02:25:30.624Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T02:59:52.067Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T03:03:39.462Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T03:39:18.842Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T03:43:08.652Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T04:31:56.131Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T04:33:03.396Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T05:07:46.670Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF
2018-02-14T05:25:52.012Z	ERROR	kubernetes/watcher.go:146	kubernetes: Watching API error EOF

@towolf
Copy link
Author

towolf commented Feb 14, 2018

And then I get message like these, which indicate it is choking on null bytes.

2018-02-14T09:54:03.963Z ERROR log/harvester.go:243 Read line error: decoding docker JSON: invalid character '\x00' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log)

I found a docker logs file that starts with roughly 620000 null bytes after json-file log rotation:

head -n 1 /var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log | xxd
...
000975c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000975d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000975e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000975f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00097600: 0000 0000 0000 007b 226c 6f67 223a 2243  .......{"log":"C
00097610: 6865 636b 696e 6720 666f 7220 6a6f 6273  hecking for jobs
00097620: 2e2e 2e20 7265 6365 6976 6564 2020 2020  ... received    
00097630: 2020 2020 2020 2020 2020 2020 2020 2020                  
00097640: 205c 7530 3031 625b 303b 6d20 206a 6f62   \u001b[0;m  job
00097650: 5c75 3030 3162 5b30 3b6d 3d32 3130 3631  \u001b[0;m=21061

Is filebeat docker prospector compatible with json-file log rotation?

However, I only found one file like that so far.

@exekias
Copy link
Contributor

exekias commented Feb 14, 2018

Both errors are not fatal:

kubernetes: Watching API error EOF occurs when the connection to the K8S API gets closed, for whatever reason. Beats will re-establish it when that happens. I would say your rate is high though, do you have networking issues?

About the JSON decoding error, it should be transient too, next messages should be correctly decoded. I'm researching the zeroes in the file after rotation, there are some measures we can implement to avoid this issue.

Have you seen the panic again?

@towolf
Copy link
Author

towolf commented Feb 14, 2018

Yes actually, I got 2 more segfaults on exactly the node that had the "malformed" logs. But also 1 on another one. This is on our staging cluster:

$ kubectl -n kube-system get pod -l k8s-app=filebeat
NAME             READY     STATUS    RESTARTS   AGE
filebeat-84fqs   1/1       Running   0          18h
filebeat-r8fq6   1/1       Running   1          18h
filebeat-rmk9w   1/1       Running   3          18h
filebeat-rxlp7   1/1       Running   0          18h
filebeat-tl76w   1/1       Running   0          18h
filebeat-z7rlh   1/1       Running   0          18h

The log of the crash can be reached using this:


$ kubectl -n kube-system logs filebeat-rmk9w --previous --tail 25
2018-02-14T04:30:19.307Z	INFO	log/harvester.go:216	Harvester started for file: /var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log
2018-02-14T04:30:19.321Z	ERROR	log/harvester.go:243	Read line error: decoding docker JSON: invalid character '\x00' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log)
2018-02-14T04:30:29.311Z	INFO	log/harvester.go:216	Harvester started for file: /var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log
2018-02-14T04:30:29.324Z	ERROR	log/harvester.go:243	Read line error: decoding docker JSON: invalid character '\x00' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log)
2018-02-14T04:30:39.314Z	INFO	log/harvester.go:216	Harvester started for file: /var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log
2018-02-14T04:30:39.328Z	ERROR	log/harvester.go:243	Read line error: decoding docker JSON: invalid character '\x00' looking for beginning of value; File: %!(EXTRA string=/var/lib/docker/containers/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847/c748a30c0c28fb6de5b53bffc1a8226a87fdca60d4df83640d889c2abe519847-json.log)
2018-02-14T04:30:47.889Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":2900,"time":2900},"total":{"ticks":21040,"time":21044,"value":21040},"user":{"ticks":18140,"time":18144}},"info":{"ephemeral_id":"2de41ee7-0730-4ed4-9360-e7f70241b11f","uptime":{"ms":2880248}},"memstats":{"gc_next":7829696,"memory_alloc":5643288,"memory_total":4368108992}},"filebeat":{"events":{"added":109,"done":109},"harvester":{"closed":3,"open_files":4,"running":4,"started":3}},"libbeat":{"config":{"module":{"running":1}},"output":{"events":{"acked":103,"batches":11,"total":103},"read":{"bytes":66},"write":{"bytes":14609}},"pipeline":{"clients":2,"events":{"active":0,"filtered":6,"published":103,"total":109},"queue":{"acked":103}}},"registrar":{"states":{"current":37,"update":109},"writes":13},"system":{"load":{"1":0.21,"15":0.44,"5":0.42,"norm":{"1":0.0263,"15":0.055,"5":0.0525}}}}}}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x13dd6fa]

goroutine 28 [running]:
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*ContainerIndexer).GetIndexes(0xc4203f2a60, 0x0, 0xc4203f11d0, 0xc420158180, 0xc420158180)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:161 +0x3a
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*Indexers).GetIndexes(0xc4203f11d0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:76 +0x230
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).removePod(0xc4200a0000, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:188 +0x3d
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).(github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.removePod)-fm(0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x34
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.processEvent(0xc4202e7ed0, 0xc4204bb920)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:175 +0x78
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).worker(0xc4200a0000)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x26f
created by github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.newKubernetesAnnotator
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:113 +0x9f7

We do not have networking issues as far as I'm aware ...

@towolf
Copy link
Author

towolf commented Feb 14, 2018

And this is from the other node that had 1 restart:

kubectl -n kube-system logs filebeat-r8fq6 --previous --tail 25
2018-02-14T10:01:58.866Z	INFO	kubernetes/watcher.go:133	kubernetes: Watching API for pod events
2018-02-14T10:02:07.856Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":4460,"time":4468},"total":{"ticks":33970,"time":33980,"value":33970},"user":{"ticks":29510,"time":29512}},"info":{"ephemeral_id":"483fe03d-5e8d-4b07-ab05-bf62c71fb679","uptime":{"ms":59070199}},"memstats":{"gc_next":29848208,"memory_alloc":15164640,"memory_total":453574040}},"filebeat":{"harvester":{"open_files":5,"running":19}},"libbeat":{"config":{"module":{"running":1}},"pipeline":{"clients":2,"events":{"active":4117}}},"registrar":{"states":{"current":25}},"system":{"load":{"1":0.27,"15":0.19,"5":0.31,"norm":{"1":0.0338,"15":0.0238,"5":0.0388}}}}}}
2018-02-14T10:02:19.695Z	ERROR	pipeline/output.go:74	Failed to connect: dial tcp 10.106.82.87:5000: getsockopt: connection timed out
2018-02-14T10:02:37.855Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":4460,"time":4468},"total":{"ticks":33970,"time":33984,"value":33970},"user":{"ticks":29510,"time":29516}},"info":{"ephemeral_id":"483fe03d-5e8d-4b07-ab05-bf62c71fb679","uptime":{"ms":59100198}},"memstats":{"gc_next":29848208,"memory_alloc":15379400,"memory_total":453788800}},"filebeat":{"harvester":{"open_files":5,"running":19}},"libbeat":{"config":{"module":{"running":1}},"pipeline":{"clients":2,"events":{"active":4117,"retry":2048}}},"registrar":{"states":{"current":25}},"system":{"load":{"1":0.23,"15":0.19,"5":0.29,"norm":{"1":0.0288,"15":0.0238,"5":0.0363}}}}}}
2018-02-14T10:03:07.855Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":4470,"time":4472},"total":{"ticks":33980,"time":33988,"value":33980},"user":{"ticks":29510,"time":29516}},"info":{"ephemeral_id":"483fe03d-5e8d-4b07-ab05-bf62c71fb679","uptime":{"ms":59130198}},"memstats":{"gc_next":29848208,"memory_alloc":15563176,"memory_total":453972576}},"filebeat":{"harvester":{"open_files":5,"running":19}},"libbeat":{"config":{"module":{"running":1}},"pipeline":{"clients":2,"events":{"active":4117}}},"registrar":{"states":{"current":25}},"system":{"load":{"1":0.14,"15":0.18,"5":0.26,"norm":{"1":0.0175,"15":0.0225,"5":0.0325}}}}}}
2018-02-14T10:03:26.711Z	ERROR	pipeline/output.go:74	Failed to connect: dial tcp 10.106.82.87:5000: getsockopt: connection timed out
2018-02-14T10:03:37.856Z	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":4470,"time":4476},"total":{"ticks":34030,"time":34044,"value":34030},"user":{"ticks":29560,"time":29568}},"info":{"ephemeral_id":"483fe03d-5e8d-4b07-ab05-bf62c71fb679","uptime":{"ms":59160196}},"memstats":{"gc_next":29845296,"memory_alloc":14922760,"memory_total":454176392}},"filebeat":{"harvester":{"open_files":5,"running":19}},"libbeat":{"config":{"module":{"running":1}},"pipeline":{"clients":2,"events":{"active":4117,"retry":2048}}},"registrar":{"states":{"current":25}},"system":{"load":{"1":0.08,"15":0.17,"5":0.24,"norm":{"1":0.01,"15":0.0213,"5":0.03}}}}}}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x13dd6fa]

goroutine 72 [running]:
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*ContainerIndexer).GetIndexes(0xc420499f80, 0x0, 0xc4204c30e0, 0xc420060cc0, 0xc420060c00)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:161 +0x3a
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*Indexers).GetIndexes(0xc4204c30e0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:76 +0x230
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).removePod(0xc4200984d0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:188 +0x3d
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).(github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.removePod)-fm(0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x34
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.processEvent(0xc42003eed0, 0xc420e9d1d0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:175 +0x78
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).worker(0xc4200984d0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x26f
created by github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.newKubernetesAnnotator
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:113 +0x9f7

@towolf
Copy link
Author

towolf commented Feb 14, 2018

Looked through the logs, and this definitely started after upgrading filebeat from 6.1.1 to 6.2.1 on Feb 8th on two clusters. 50 Segfaults in total since that day.

@wflyer
Copy link
Contributor

wflyer commented Feb 27, 2018

I'm also experiencing same issue.

I'm using docker.elastic.co/beats/filebeat:6.2.2 with kubernetes v1.9.2.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x13dd8ba]

goroutine 96 [running]:
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*ContainerIndexer).GetIndexes(0xc42042bfc0, 0x0, 0xc420490bd0, 0xc420483680, 0xc420483620)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:161 +0x3a
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*Indexers).GetIndexes(0xc420490bd0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:76 +0x230
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).removePod(0xc4204bc0e0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:188 +0x3d
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).(github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.removePod)-fm(0x0)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x34
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.processEvent(0xc4203e9ed0, 0xc42031d9b0)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:175 +0x78
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).worker(0xc4204bc0e0)
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x26f
created by github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.newKubernetesAnnotator
        /go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:113 +0x9f7

@ruflin
Copy link
Contributor

ruflin commented Feb 28, 2018

@exekias I wonder if we should check here https://github.com/elastic/beats/blob/master/libbeat/processors/add_kubernetes_metadata/indexers.go#L161 for nil values and log an Error? Or should GetIndexes also support returning an error?

@towolf
Copy link
Author

towolf commented Feb 28, 2018

s234 is the node with the most container churn, as this is the node where Gitlab-Runner schedules its build pods.

This seems to be correlated with actual log contents? Or does the filebeat instance prospecting the most new log files do the most lookups in the Kubernetes API?

$ kubectl -n kube-system get pod -l k8s-app=filebeat -o wide
NAME             READY     STATUS    RESTARTS   AGE       IP             NODE
filebeat-84fqs   1/1       Running   25         14d       172.31.3.242   h504
filebeat-r8fq6   1/1       Running   20         14d       172.31.5.89    h1355
filebeat-rmk9w   1/1       Running   112        14d       172.31.4.172   s234
filebeat-rv5qv   1/1       Running   4          8d        172.31.2.25    s237
filebeat-rxlp7   1/1       Running   10         14d       172.31.0.14    s239
filebeat-z7rlh   1/1       Running   10         14d       172.31.1.10    s238
$ kubectl -n kube-system get pod -l k8s-app=filebeat -o wide
NAME             READY     STATUS    RESTARTS   AGE       IP             NODE
filebeat-68svf   1/1       Running   19         19d       172.31.8.99    s931
filebeat-9w9ws   1/1       Running   6          19d       172.31.0.9     s851
filebeat-bg744   1/1       Running   6          19d       172.31.5.11    s853
filebeat-bhds5   1/1       Running   4          19d       172.31.4.7     s918
filebeat-g4zlm   1/1       Running   4          19d       172.31.6.9     s858
filebeat-mjmzw   1/1       Running   5          19d       172.31.7.9     s932
filebeat-pns7c   1/1       Running   6          19d       172.31.2.9     s859
filebeat-rjs5m   1/1       Running   6          19d       172.31.3.7     s941
filebeat-wq2qt   1/1       Running   5          19d       172.31.9.11    s939
filebeat-zfgr4   1/1       Running   9          8d        172.31.10.16   h3255
filebeat-zqcsh   1/1       Running   21         19d       172.31.1.96    s898

@exekias
Copy link
Contributor

exekias commented Feb 28, 2018

Yes, I think I'm adding some defensive code + logging there to avoid this issue.

Thank you everyone for reporting this! so far I couldn't reproduce the issue so the stack traces help a lot here

@exekias
Copy link
Contributor

exekias commented Feb 28, 2018

@ruflin I've just opened #6487, let me know what you think. Also, master and 6.3 should not be affected as this code was refactored

exekias pushed a commit to exekias/beats that referenced this issue Feb 28, 2018
While the root cause is unclear, this change adds defensive code against
nil Pod processing errors. It also adds more logging to debug this
further.

Fixes elastic#6372
ruflin pushed a commit that referenced this issue Mar 2, 2018
While the root cause is unclear, this change adds defensive code against
nil Pod processing errors. It also adds more logging to debug this
further.

Fixes #6372
@ruflin
Copy link
Contributor

ruflin commented Mar 4, 2018

@exekias Merged. Good to know the other branches are not affected. Ignore my comment in the PR then.

@yawboateng
Copy link

seeing the same (causes container to crash), i'm on the latest 6.2.2 build and on k8s 1.8.5:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x13dd8ba]

goroutine 58 [running]:
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*ContainerIndexer).GetIndexes(0xc4203fe7e0, 0x0, 0xc4203dcc00, 0xc4201a2120, 0xc4201a2060)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:161 +0x3a
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*Indexers).GetIndexes(0xc4203dcc00, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/indexers.go:76 +0x230
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).removePod(0xc4201123f0, 0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:188 +0x3d
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).(github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.removePod)-fm(0x0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x34
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.processEvent(0xc420046ed0, 0xc420413e60)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:175 +0x78
github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).worker(0xc4201123f0)
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:159 +0x26f
created by github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata.newKubernetesAnnotator
	/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:113 +0x9f7

@exekias
Copy link
Contributor

exekias commented Mar 11, 2018

Closed by #6487, to be released with 6.2.3

@exekias exekias closed this as completed Mar 11, 2018
@towolf
Copy link
Author

towolf commented Mar 12, 2018

Found another bad problem in #6536. Not sure if that may be related to this issue?

leweafan pushed a commit to leweafan/beats that referenced this issue Apr 28, 2023
While the root cause is unclear, this change adds defensive code against
nil Pod processing errors. It also adds more logging to debug this
further.

Fixes elastic#6372
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants