Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dreaded "can't add a new key into hash during iteration" again #381

Open
jcrood opened this issue Apr 19, 2024 · 1 comment
Open

Dreaded "can't add a new key into hash during iteration" again #381

jcrood opened this issue Apr 19, 2024 · 1 comment

Comments

@jcrood
Copy link

jcrood commented Apr 19, 2024

I installed fluentd on k8s using fluentd-kubernetes-daemonset v1.16.5 and ocasionally have it logging an error from the metadata plugin:

2024-04-19 09:18:20 +0000 [warn]: #0 dump an error event: error_class=RuntimeError error="can't add a new key into hash during iteration" location="/fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-kubernetes_metadata_filter-3.4.0/lib/fluent/plugin/kubernetes_metadata_cache_strategy.rb:86:in 'merge!'" tag="kubernetes.var.log.containers.calico-node-hrnj2_kube-system_calico-node-455d05a2c5e6b90ce4444c083d2aeeebf89c454d1588ea6fe7d314a7eaa61e6e.log" time=2024-04-14 22:15:41.969400595 +0000 record={"stream"=>"stdout", "logtag"=>"F", "message"=>"2024-04-14 22:15:41.968 [INFO][94] felix/summary.go 100: Summarising 11 dataplane reconciliation loops over 1m4.6s: avg=7ms longest=22ms ()", "time"=>"2024-04-15T00:15:41.969400595+02:00", "docker"=>{"container_id"=>"455d05a2c5e6b90ce4444c083d2aeeebf89c454d1588ea6fe7d314a7eaa61e6e"}, "kubernetes"=>{"container_name"=>"calico-node", "namespace_name"=>"kube-system", "pod_name"=>"calico-node-hrnj2", "container_image"=>"quay.io/calico/node:v3.24.5", "container_image_id"=>"sha256:54637cb36d4a1c029fb994c6fc88af04791c1f2dcbd12a24aa995c0bffaacdb1", "pod_id"=>"b3573c8d-a8a1-4d46-bd63-a447b609e71f", "pod_ip"=>"10.254.6.12", "host"=>"m02.c042.k8s.example.com", "labels"=>{"controller-revision-hash"=>"79ddc99967", "k8s-app"=>"calico-node", "pod-template-generation"=>"4"}, "master_url"=>"https://10.233.0.1:443/api"}, "@timestamp"=>"2024-04-14T22:15:41.969400595+00:00"}

This appears to be similar to #237 and #260, except it occurs in a different part of the caching code and in a newer version. I've tried shuffling some code around, taking out the namespace_cache fetch and isolating the merge! call and it keeps failing on that. It's very well possible that it trips up over ruby's own https://github.com/ruby/ruby/blob/ruby_2_3/hash.c#L2315.

Edited to add: having dug a bit deeper info what's happening, it appears that this can occur because both merge!s (here and [here]https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/v3.4.0/lib/fluent/plugin/kubernetes_metadata_cache_strategy.rb#L86)) can happen on cache entries of the pod metadata and is not thread safe. I've currently "fixed" it for myself by using plain merge and writing back the updated copy back to the cache, which is synchronised.

@jcantrill
Copy link
Contributor

@jcrood pull requests are welcome. If you have a solution please submit something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants