-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closing add_kubernetes_metadata processor each time a filestream reader stops makes filebeat panic #32200
Comments
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
add_kubernetes_metadata
processor each time a filestream reader stops makes filebeat panic
The |
I also meet the problem.
my conf file: - type: filestream
id: tomcat-backgroud-log-1
paths:
- /var/lib/kubelet/pods/*/volumes/kubernetes.io~empty-dir/log-dir/*.log
fields_under_root: true
ignore_inactive: since_last_start
close.on_state_change.inactive: 5m
prospector.scanner.check_interval: 1s
message_max_bytes: 2097152
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
add_resource_metadata:
namespace:
enabled: false
node:
enabled: false
deployment: false
cronjob: false
default_indexers.enabled: false
default_matchers.enabled: false
indexers:
- pod_uid:
matchers:
- logs_path:
logs_path: "/var/lib/kubelet/pods/"
resource_type: 'pod' the error log: {
"log.level": "error",
"@timestamp": "2022-12-13T18:00:17.129+0800",
"log.logger": "input.filestream",
"log.origin": {
"file.name": "input-logfile/harvester.go",
"file.line": 168
},
"message": "Harvester crashed with: harvester panic with: close of closed channel\ngoroutine 224 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x65\ngithub.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.startHarvester.func1.1()\n\tgithub.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile/harvester.go:167 +0x78\npanic({0x558ebc1660e0, 0x558ebc6ecbb0})\n\truntime/panic.go:844 +0x258\ngithub.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*cache).stop(...)\n\tgithub.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata/cache.go:97\ngithub.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).Close(0xc0009e7600?)\n\tgithub.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata/kubernetes.go:311 +0x4f\ngithub.com/elastic/beats/v7/libbeat/processors.Close(...)\n\tgithub.com/elastic/beats/v7/libbeat/processors/processor.go:58\ngithub.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x5?)\n\tgithub.com/elastic/beats/v7/libbeat/publisher/processing/processors.go:95 +0x159\ngithub.com/elastic/beats/v7/libbeat/processors.Close(...)\n\tgithub.com/elastic/beats/v7/libbeat/processors/processor.go:58\ngithub.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x0?)\n\tgithub.com/elastic/beats/v7/libbeat/publisher/processing/processors.go:95 +0x159\ngithub.com/elastic/beats/v7/libbeat/processors.Close(...)\n\tgithub.com/elastic/beats/v7/libbeat/processors/processor.go:58\ngithub.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close.func1()\n\tgithub.com/elastic/beats/v7/libbeat/publisher/pipeline/client.go:167 +0x2df\nsync.(*Once).doSlow(0x0?, 0x0?)\n\tsync/once.go:68 +0xc2\nsync.(*Once).Do(...)\n\tsync/once.go:59\ngithub.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close(0x558eba364866?)\n\tgithub.com/elastic/beats/v7/libbeat/publisher/pipeline/client.go:148 +0x59\ngithub.com/elastic/beats/v7/filebeat/beater.(*countingClient).Close(0x558eba3647df?)\n\tgithub.com/elastic/beats/v7/filebeat/beater/channels.go:145 +0x22\ngithub.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.startHarvester.func1({0x558ebc7305d0?, 0xc0006ce6c0})\n\tgithub.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile/harvester.go:219 +0x949\ngithub.com/elastic/go-concert/unison.(*TaskGroup).Go.func1()\n\tgithub.com/elastic/[email protected]/unison/taskgroup.go:163 +0xc3\ncreated by github.com/elastic/go-concert/unison.(*TaskGroup).Go\n\tgithub.com/elastic/[email protected]/unison/taskgroup.go:159 +0xca\n",
"service.name": "filebeat",
"id": "tomcat-backgroud-log-1",
"source_file": "filestream::tomcat-backgroud-log-1::native::671566857-64768",
"ecs.version": "1.6.0"
} the last panic info:
|
Currently there are two ways to enable
add_kubernetes_metadata
processor with filestream input to collect container logs and get their metadata.The first one is the processor as part of the input:
and the second is the processors outside of the input in a more central place:
In the first case,
each time a filestream reader closes ( In case of kubernetes jobs that stop after their work is finished, the file remains idle and I guess at some point the reader closes) then it stops the connected processors which means:
Next time the same happens (filestream reader stops) the same process takes place. But the watcher is already stoped (nothing happens) and the cache
done
channel is closed.The second closing of an already closed channel makes filebeat panic
If we try to handle this error in the processor side we have two options:
If the first time that the cache is closed we set its value to nil then this will lead to errors because there are still other clients (from different filestream readers) that try to access the cache.
If we check first wether the cache channel is closed or not before closing then we avoid that error. But it is still doesn't make sense as the watcher(for new pods) is stopped and the cache doesn't not get updated. So all the next clients of the cache will get invalid data.
This problem occurs because multiple clients are using the same processor. And each one tries to close it whenever they finish.
This problem does not take place in the second scenario where the
add_kubernetes_metadata
is outside of the filestream input. Each reader does not try to close the processor. Also it does not occur when usingcontainer
input instead offilestream
.Is stopping the connected processors each time a filestream reader finishes the right approach?
As far as I understand the different clients that use the same processors are part of a pipeline where clients publish event into. It looks like one client is created per file(not 100% sure).
All these clients share the same processor
add_kubernetes_metadata
id. When a filestream reader gets closed, Close is called.We could check if the length of clients list contains clients before closing the processors. Problem is that the client's Close method is called by other places as well like https://github.com/elastic/beats/blob/main/filebeat/beater/channels.go#L145.
Another easier approach would be to not allow
add_kubernetes_metadata
processor to be part of the input level and only be configured outside it. Or we should just document it.@elastic/elastic-agent-data-plane what do you think about this? Why is filestream approach of closing processors different to
container
?Also @elastic/obs-cloud-monitoring have you face the same problem (maybe try the same scenario out) with
add_cloud_metadata
processor?The text was updated successfully, but these errors were encountered: