-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host_metrics throwing errors #18916
Comments
I can't reproduce the issue if I just run By the sounds of it the source isn't able to access |
Thanks for your response @StephenWakely |
I wonder if it is an issue of the cgroups structure changing while we are trying to read the files. That would introduce a race condition that would produce those "file not found" errors. Do you know if this happens more frequently when the contents of the pods are changing, @adolsalamanca? |
Thanks for your message @bruceg Sounds like a reasonable assumption, but I don't have much data to assess it right now. It wasn't the first time, but didn't happen again since we reported it 😞 |
To be clear, it's not so much a matter of IOPS but that of pods changing, that is containers being started or stopped. From comparing your error to what is in the source code, I can definitely see the possibility for this to be happening, I'm just curious if that is what is happening to you. |
Thanks for your response @bruceg kubectl get pods -l app.kubernetes.io/instance=cgroups-metrics
NAME READY STATUS RESTARTS AGE
cgroups-metrics-2dsbl 1/1 Running 0 66d
cgroups-metrics-2kvzl 1/1 Running 0 66d
cgroups-metrics-2mpb5 1/1 Running 0 66d
cgroups-metrics-42r7p 1/1 Running 0 66d
cgroups-metrics-4cpd6 1/1 Running 0 66d
cgroups-metrics-4hrrb 1/1 Running 0 16d
cgroups-metrics-4ktmm 1/1 Running 0 66d
cgroups-metrics-4ns5r 1/1 Running 0 50d
cgroups-metrics-4p88m 1/1 Running 0 66d
cgroups-metrics-4pk8t 1/1 Running 0 66d
cgroups-metrics-5d7vv 1/1 Running 0 66d
cgroups-metrics-5nbg4 1/1 Running 0 66d
cgroups-metrics-64zcj 1/1 Running 0 66d
cgroups-metrics-6756r 1/1 Running 0 66d
cgroups-metrics-6h5bv 1/1 Running 0 66d
cgroups-metrics-6tbb2 1/1 Running 0 66d
cgroups-metrics-6zfv5 1/1 Running 0 66d
cgroups-metrics-78ftx 1/1 Running 0 45d
cgroups-metrics-8vkml 1/1 Running 0 66d
cgroups-metrics-9569m 1/1 Running 0 66d
cgroups-metrics-9kxht 1/1 Running 0 66d
cgroups-metrics-9mjxb 1/1 Running 0 66d
cgroups-metrics-9tnsw 1/1 Running 0 66d
cgroups-metrics-b4m64 1/1 Running 0 66d
cgroups-metrics-bgdxb 1/1 Running 0 66d
cgroups-metrics-bqttv 1/1 Running 0 43d
cgroups-metrics-br9n2 1/1 Running 0 8d
cgroups-metrics-brd98 1/1 Running 0 66d
cgroups-metrics-c25pp 1/1 Running 0 66d
cgroups-metrics-c4mzd 1/1 Running 0 66d
cgroups-metrics-cftfh 1/1 Running 0 66d
cgroups-metrics-cqv4x 1/1 Running 0 66d
cgroups-metrics-crcqg 1/1 Running 0 66d
cgroups-metrics-csrhd 1/1 Running 0 66d
cgroups-metrics-ctft9 1/1 Running 0 66d
cgroups-metrics-djhs2 1/1 Running 0 66d
cgroups-metrics-dkdd8 1/1 Running 0 66d
cgroups-metrics-fr4b2 1/1 Running 0 66d
cgroups-metrics-fz44n 1/1 Running 0 66d
cgroups-metrics-gczlh 1/1 Running 0 66d
cgroups-metrics-ghzvq 1/1 Running 0 66d
cgroups-metrics-gnjv7 1/1 Running 0 66d
cgroups-metrics-gp4zk 1/1 Running 0 14d
cgroups-metrics-h2v4p 1/1 Running 0 43d
cgroups-metrics-hc6tt 1/1 Running 0 26d
cgroups-metrics-hgl6r 1/1 Running 0 66d
cgroups-metrics-hjx2p 1/1 Running 0 66d
cgroups-metrics-hzmh9 1/1 Running 0 42d
cgroups-metrics-j9zqs 1/1 Running 0 66d
cgroups-metrics-jccp4 1/1 Running 0 66d
cgroups-metrics-jxc4x 1/1 Running 0 45d
cgroups-metrics-k4g8r 1/1 Running 0 6d3h
cgroups-metrics-k9d78 1/1 Running 0 66d
cgroups-metrics-kddk6 1/1 Running 0 66d
cgroups-metrics-kvnnm 1/1 Running 0 66d
cgroups-metrics-l2tql 1/1 Running 0 66d
cgroups-metrics-l8clx 1/1 Running 0 66d
cgroups-metrics-lh8hf 1/1 Running 0 6d3h
cgroups-metrics-lk842 1/1 Running 0 24d
cgroups-metrics-ls8qd 1/1 Running 0 66d
cgroups-metrics-lv7f6 1/1 Running 0 66d
cgroups-metrics-m8jzh 1/1 Running 0 66d
cgroups-metrics-mmsh7 1/1 Running 0 66d
cgroups-metrics-mnxtp 1/1 Running 0 43d
cgroups-metrics-mtz52 1/1 Running 0 23d
cgroups-metrics-mvkdj 1/1 Running 0 66d
cgroups-metrics-n9qnt 1/1 Running 0 66d
cgroups-metrics-npfd5 1/1 Running 0 66d
cgroups-metrics-nxwvn 1/1 Running 0 66d
cgroups-metrics-p4zxj 1/1 Running 0 66d
cgroups-metrics-p5grq 1/1 Running 0 66d
cgroups-metrics-p7bjf 1/1 Running 0 66d
cgroups-metrics-p7tk2 1/1 Running 0 66d
cgroups-metrics-ptgkl 1/1 Running 0 66d
cgroups-metrics-pvf4d 1/1 Running 0 66d
cgroups-metrics-q2xgq 1/1 Running 0 66d
cgroups-metrics-q69ws 1/1 Running 0 66d
cgroups-metrics-qdggd 1/1 Running 0 45d
cgroups-metrics-qq2wf 1/1 Running 0 66d
cgroups-metrics-r76c2 1/1 Running 0 66d
cgroups-metrics-rxkll 1/1 Running 0 66d
cgroups-metrics-s28jf 1/1 Running 1 (21d ago) 66d
cgroups-metrics-sb46m 1/1 Running 0 66d
cgroups-metrics-sjpb2 1/1 Running 0 66d
cgroups-metrics-sqnkg 1/1 Running 0 66d
cgroups-metrics-sxzgh 1/1 Running 0 64d
cgroups-metrics-t5c29 1/1 Running 0 66d
cgroups-metrics-v6dqk 1/1 Running 0 42d
cgroups-metrics-v9r45 1/1 Running 0 66d
cgroups-metrics-vdphv 1/1 Running 0 66d
cgroups-metrics-vgt2l 1/1 Running 0 66d
cgroups-metrics-vj2ml 1/1 Running 0 66d
cgroups-metrics-vnqgc 1/1 Running 0 66d
cgroups-metrics-vzxnq 1/1 Running 0 63d
cgroups-metrics-w75rg 1/1 Running 0 66d
cgroups-metrics-w9sj5 1/1 Running 0 36d
cgroups-metrics-wmxpz 1/1 Running 0 66d
cgroups-metrics-wrx4v 1/1 Running 0 66d
cgroups-metrics-wtx5q 1/1 Running 0 66d
cgroups-metrics-wxwch 1/1 Running 0 66d
cgroups-metrics-x2tjl 1/1 Running 0 43d
cgroups-metrics-xk7tm 1/1 Running 0 66d
cgroups-metrics-z7sp2 1/1 Running 0 66d
cgroups-metrics-zccqd 1/1 Running 0 66d kubectl -n $sav get pods -l app.kubernetes.io/instance=cgroups-metrics
NAME READY STATUS RESTARTS AGE
cgroups-metrics-2kbg2 1/1 Running 0 31d
cgroups-metrics-2zb52 1/1 Running 0 2d11h
cgroups-metrics-444vm 1/1 Running 0 45h
cgroups-metrics-4n4kr 1/1 Running 0 31d
cgroups-metrics-4n4td 1/1 Running 0 31d
cgroups-metrics-4rdwh 1/1 Running 0 31d
cgroups-metrics-5l59j 1/1 Running 0 31d
cgroups-metrics-6bdlv 1/1 Running 0 31d
cgroups-metrics-7vglz 1/1 Running 0 31d
cgroups-metrics-8jgkn 1/1 Running 0 31d
cgroups-metrics-9fcpp 1/1 Running 0 31d
cgroups-metrics-9kl2n 1/1 Running 0 31d
cgroups-metrics-9n6wr 1/1 Running 0 31d
cgroups-metrics-9pps2 1/1 Running 2 (3d5h ago) 31d
cgroups-metrics-9w27x 1/1 Running 0 31d
cgroups-metrics-b6lgq 1/1 Running 0 43h
cgroups-metrics-b7bl5 1/1 Running 0 31d
cgroups-metrics-bf8cl 1/1 Running 0 31d
cgroups-metrics-bnspl 1/1 Running 0 13d
cgroups-metrics-cct2c 1/1 Running 0 2d4h
cgroups-metrics-d7dt4 1/1 Running 0 31d
cgroups-metrics-dbjmj 1/1 Running 0 10d
cgroups-metrics-g4nr5 1/1 Running 0 31d
cgroups-metrics-gbxzl 1/1 Running 0 2d15h
cgroups-metrics-hbhdk 1/1 Running 0 31d
cgroups-metrics-jldfk 1/1 Running 1 (13d ago) 31d
cgroups-metrics-jzlwq 1/1 Running 0 15d
cgroups-metrics-kg29s 1/1 Running 0 31d
cgroups-metrics-kk6x2 1/1 Running 0 31d
cgroups-metrics-kx4tz 1/1 Running 0 31d
cgroups-metrics-lrxbt 1/1 Running 0 31d
cgroups-metrics-mgqwb 1/1 Running 0 31d
cgroups-metrics-nc4mp 1/1 Running 0 31d
cgroups-metrics-pnr2q 1/1 Running 0 31d
cgroups-metrics-pr8vm 1/1 Running 0 31d
cgroups-metrics-q4gxl 1/1 Running 0 31d
cgroups-metrics-qb6dp 1/1 Running 0 31d
cgroups-metrics-qsqcb 1/1 Running 0 31d
cgroups-metrics-rltft 1/1 Running 0 31d
cgroups-metrics-rn68q 1/1 Running 0 31d
cgroups-metrics-rxnmg 1/1 Running 0 31d
cgroups-metrics-srshk 1/1 Running 0 31d
cgroups-metrics-sxr2s 1/1 Running 0 34h
cgroups-metrics-t642x 1/1 Running 0 31d
cgroups-metrics-t8fst 1/1 Running 0 31d
cgroups-metrics-tnpdv 1/1 Running 0 31d
cgroups-metrics-tnsbv 1/1 Running 0 31d
cgroups-metrics-tqh6k 1/1 Running 0 14d
cgroups-metrics-v2r4g 1/1 Running 0 2d17h
cgroups-metrics-wk6nn 1/1 Running 0 31d
cgroups-metrics-x275s 1/1 Running 0 31d
cgroups-metrics-x8jkt 1/1 Running 0 31d
cgroups-metrics-xh2f5 1/1 Running 0 31d
cgroups-metrics-xrxvf 1/1 Running 0 31d
cgroups-metrics-z28t4 1/1 Running 0 31d
cgroups-metrics-z8crf 1/1 Running 0 31d
cgroups-metrics-zk66t 1/1 Running 1 (3d14h ago) 29d
cgroups-metrics-zxln4 1/1 Running 0 10d |
Right, I would expect it to be fairly rare. It would happen where the list of groups changed between listing the directory and reading the contents of the files in it. It would probably help if Vector was busy at the same time and ended up scheduling away from the host metrics task internally. |
Another occurrence, leaving logs below
|
A note for the community
Problem
We're running vector as a daemonset in our k8s cluster.
All of a sudden we started getting errors about host_metrics, we're only using cgroups collector on it.
Configuration
Version
timberio/vector:0.27.X-alpine
Debug Output
Example Data
No response
Additional Context
No response
References
No response
The text was updated successfully, but these errors were encountered: