You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I installed fluentd on k8s using fluentd-kubernetes-daemonset v1.16.5 and ocasionally have it logging an error from the metadata plugin:
2024-04-19 09:18:20 +0000 [warn]: #0 dump an error event: error_class=RuntimeError error="can't add a new key into hash during iteration" location="/fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-kubernetes_metadata_filter-3.4.0/lib/fluent/plugin/kubernetes_metadata_cache_strategy.rb:86:in 'merge!'" tag="kubernetes.var.log.containers.calico-node-hrnj2_kube-system_calico-node-455d05a2c5e6b90ce4444c083d2aeeebf89c454d1588ea6fe7d314a7eaa61e6e.log" time=2024-04-14 22:15:41.969400595 +0000 record={"stream"=>"stdout", "logtag"=>"F", "message"=>"2024-04-14 22:15:41.968 [INFO][94] felix/summary.go 100: Summarising 11 dataplane reconciliation loops over 1m4.6s: avg=7ms longest=22ms ()", "time"=>"2024-04-15T00:15:41.969400595+02:00", "docker"=>{"container_id"=>"455d05a2c5e6b90ce4444c083d2aeeebf89c454d1588ea6fe7d314a7eaa61e6e"}, "kubernetes"=>{"container_name"=>"calico-node", "namespace_name"=>"kube-system", "pod_name"=>"calico-node-hrnj2", "container_image"=>"quay.io/calico/node:v3.24.5", "container_image_id"=>"sha256:54637cb36d4a1c029fb994c6fc88af04791c1f2dcbd12a24aa995c0bffaacdb1", "pod_id"=>"b3573c8d-a8a1-4d46-bd63-a447b609e71f", "pod_ip"=>"10.254.6.12", "host"=>"m02.c042.k8s.example.com", "labels"=>{"controller-revision-hash"=>"79ddc99967", "k8s-app"=>"calico-node", "pod-template-generation"=>"4"}, "master_url"=>"https://10.233.0.1:443/api"}, "@timestamp"=>"2024-04-14T22:15:41.969400595+00:00"}
This appears to be similar to #237 and #260, except it occurs in a different part of the caching code and in a newer version. I've tried shuffling some code around, taking out the namespace_cache fetch and isolating the merge! call and it keeps failing on that. It's very well possible that it trips up over ruby's own https://github.com/ruby/ruby/blob/ruby_2_3/hash.c#L2315.
I installed fluentd on k8s using fluentd-kubernetes-daemonset v1.16.5 and ocasionally have it logging an error from the metadata plugin:
2024-04-19 09:18:20 +0000 [warn]: #0 dump an error event: error_class=RuntimeError error="can't add a new key into hash during iteration" location="/fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-kubernetes_metadata_filter-3.4.0/lib/fluent/plugin/kubernetes_metadata_cache_strategy.rb:86:in 'merge!'" tag="kubernetes.var.log.containers.calico-node-hrnj2_kube-system_calico-node-455d05a2c5e6b90ce4444c083d2aeeebf89c454d1588ea6fe7d314a7eaa61e6e.log" time=2024-04-14 22:15:41.969400595 +0000 record={"stream"=>"stdout", "logtag"=>"F", "message"=>"2024-04-14 22:15:41.968 [INFO][94] felix/summary.go 100: Summarising 11 dataplane reconciliation loops over 1m4.6s: avg=7ms longest=22ms ()", "time"=>"2024-04-15T00:15:41.969400595+02:00", "docker"=>{"container_id"=>"455d05a2c5e6b90ce4444c083d2aeeebf89c454d1588ea6fe7d314a7eaa61e6e"}, "kubernetes"=>{"container_name"=>"calico-node", "namespace_name"=>"kube-system", "pod_name"=>"calico-node-hrnj2", "container_image"=>"quay.io/calico/node:v3.24.5", "container_image_id"=>"sha256:54637cb36d4a1c029fb994c6fc88af04791c1f2dcbd12a24aa995c0bffaacdb1", "pod_id"=>"b3573c8d-a8a1-4d46-bd63-a447b609e71f", "pod_ip"=>"10.254.6.12", "host"=>"m02.c042.k8s.example.com", "labels"=>{"controller-revision-hash"=>"79ddc99967", "k8s-app"=>"calico-node", "pod-template-generation"=>"4"}, "master_url"=>"https://10.233.0.1:443/api"}, "@timestamp"=>"2024-04-14T22:15:41.969400595+00:00"}
This appears to be similar to #237 and #260, except it occurs in a different part of the caching code and in a newer version. I've tried shuffling some code around, taking out the namespace_cache fetch and isolating the
merge!
call and it keeps failing on that. It's very well possible that it trips up over ruby's own https://github.com/ruby/ruby/blob/ruby_2_3/hash.c#L2315.Edited to add: having dug a bit deeper info what's happening, it appears that this can occur because both
merge!
s (here and [here]https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/v3.4.0/lib/fluent/plugin/kubernetes_metadata_cache_strategy.rb#L86)) can happen on cache entries of the pod metadata and is not thread safe. I've currently "fixed" it for myself by using plainmerge
and writing back the updated copy back to the cache, which is synchronised.The text was updated successfully, but these errors were encountered: