-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitoring: allow specifying /proc or hostfs path. #23267
Comments
Pinging @elastic/integrations-services (Team:Services) |
/proc
/ hostfs
path.
This definitely seems like a bug. Might be an easy fix, gonna poke around. |
This is where it is hardcoded to "" (aka /): The first parameter of NewReader in "cgroups, err := cgroup.NewReader("", true)" should be confgiruable. See also: https://github.com/elastic/gosigar/blob/master/cgroup/reader.go#L61 |
Yep. Sorry for the delay, been distracted with 7.11, and then the Holidays happened. We can probably just use |
[Outdated, please see next comment.] Hi, #22879 has fix version 7.11.0. I have updated to that version but still get the error. I have also tried to set the env var LIBBEAT_MONITORING_CGROUPS_HIERARCHY_OVERRIDE to "/hostfs" but that doesn't seem to make a difference. Metricebeat gets started in the container with "metricbeat -c /etc/metricbeat.yml -e -system.hostfs=/hostfs" So I'm not sure what's going on here. Is it not part of the release or have a missed some configuration option? I did notice I now get the updated error message "error getting cgroup .." instead of "error getting group .." Thanks |
Ok. Did a bit more digging and got some new insights so please ignore above. It does seem to pick up the system.hostfs setting. So far so good. The error: Now if I check in the container I notice that /proc/23615/cgroup does not exist but /hostfs/proc/23615/cgroup does. And furthermore the pid 23615 is metricbeat self:
So it looks like there is a bug with the monitoring of its own process which doesn't take the system.hostfs setting into account? Bart |
@schans so the error is only happening with metricbeat's own pid? |
Indeed. It is only happening for metricbeat's own pid (as seen from the host). |
From what I can find: The reportBeatCgroups function in tries to get the cgoup stats of its own pid (process.GetSelfPid()) by calling cgroups.GetStatsForProcess(pid): https://github.com/elastic/beats/blob/7.11.0/libbeat/cmd/instance/metrics/metrics.go#L299 It uses the Reader options set in that method with NewReaderOptions: https://github.com/elastic/beats/blob/7.11.0/libbeat/cmd/instance/metrics/metrics.go#L287 which is missing the RootfsMountpoint . Then gosigar tries to get the stats by calling ProcessCgroupPaths with the rootfsMountpoint which defaults to "/" if not set: https://github.com/elastic/gosigar/blob/master/cgroup/reader.go#L97 The actual open call is in the util file: https://github.com/elastic/gosigar/blob/master/cgroup/util.go#L235 which constructs the path based on the rootfsMountpoint and pid. I don't know (yet) what the best option is to pass the command line flag value of system.hostfs so that it is available in reportBeatCgroups(). HTH |
Maybe getting it through the system module is an option? Like in https://github.com/elastic/beats/blob/master/metricbeat/module/system/process/process.go#L87 |
Yep, beat me to it. We'll need to modify the cmd/instance startup to look for the rootfs variable. |
Hi all! Any progress on this? Any way I can help? |
Assuming nothing gets in the way, I'd like to get this done this week, since it should be a fairly simple fix. |
Fix here: #24334 Took a bit longer that expected, as fixing it in a not-ugly way took a bit of time. |
Hi ! I seem to be having a similar issue here. I have this error in my container :
PID 7 is metricbeat. Metricbeat is started in the container by When I run a shell in the container, I can see that Is there a way to fix this ? |
@Irindul There's currently a handful of inconsistencies in how metricbeat handles |
I would like to add that setting metricbeat version used: 7.12.1 |
Is there any update on this? Seems like it is still an issue on 7.16.2. |
@nugmanovtimur Are you getting the same error message as the original issue? What platform is metricbeat running on? What OS distro and version? |
Hello @fearful-symmetry I'm getting the below error while using version 7.16.2 over a GKE cluster and I still get the below error, along with some other errors.
I tried commenting out the volume and volume mount from the yaml file to get over it but still got the same error. |
@n0ur-sh Are you seeing the |
@fearful-symmetry no the PID remains the same throughout the errors if that's what you mean. I'm still trying to find a proper workaround solution but in the best-case scenario, I still get the I still have two points that I'd wish to validate:
Thanks. UPDATE @fearful-symmetry I tried testing the issue on the GKE node itself and contrary to what the ERROR says the file and directory exist and are accessible as shown below POD Log
PID is accurate
The file actually exists
I hope this helps out |
@n0ur-sh Based on that one error you showed me, the cgroups issue is limited to shelf-monitoring, and shouldn't effect metrics; I believe those |
This issue is also seen in 7.16.3. A workaround until resolved would be great. |
DS YAML: Singleton YAML: |
@ayushmathur86 @kbujold @keifgwinn , a fix was merged here: #30228 Just to clarify, this is limited to preventing libbeat's self-monitoring from reading cgroups info, and should have no impact on actual upstream reported metrics. I'm a little baffled by the actual reported issues here, though. I was able to reproduce this in k8s with |
@fearful-symmetry does it mean that it's necessary to change the startup arguments for metricbeats? cc @ayushmathur86 |
@framsouza the fix shouldn't require any change to config. |
@fearful-symmetry that's a good news. May I know in which version can we expect the fix to be available ? |
@ayushmathur86 this'll be in 8.2. |
@fearful-symmetry can the fix be merged into 7.16? We are using 7.16.3 and we would like to have a solution since these errors continuously flood the logs. |
@kbujold as far as I'm aware, there won't be any other 7.16 releases, but we could try to backport into a future |
@fearful-symmetry Backport to 7.17 would help thanks! |
We have upgraded to 7.17 is there any chance a fix could be backport into 7.17? Even if its not released we could just patch it. |
We were initially hesitant to backport this, but I've made a conservative patch for 7.17 here: #31002 |
@fearful-symmetry can I close this issue then as you backported your PR? |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
@jlind23 oop, missed your ping. Yep, closing this. |
+1 got this bug in 8.1.2, waiting for 8.2 |
Between the lines i've seen: I've added an env variable in place of command line flag:
It seems to work for me in 8.1.2, i'm no longer spammed in log by this message |
With 8.1.3 adding following to metricbeat config file will cause ignoring metricbeat.modules:
- module: system
hostfs: "/hostfs" With this config option, I'm getting |
Hello, We are running 7.17.0 and still seeing the error flood the logs: 2022-06-03T15:25:22.977Z ERROR metrics/metrics.go:304 error determining cgroups version: error reading /proc/2395793/cgroup: open /proc/2395793/cgroup: no such file or directory Was that patch made available yet? |
Same problem here with 8.1.0: Service config in docker: services:
metricbeat-es:
image: docker.elastic.co/beats/metricbeat:8.1.0
command: ["-system.hostfs=/hostfs", "-e", "--environment=container"]
user: root
cap_add:
- SYS_ADMIN
- NET_ADMIN
- SYS_PTRACE
- DAC_READ_SEARCH
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /proc:/hostfs/proc:ro
- /sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro
- /:/hostfs:ro Metricbeat config: - module: system
metricsets:
- cpu # CPU usage
- load # CPU load averages
- memory # Memory usage
- network # Network IO
- process # Per process metrics
- process_summary # Process summary
- uptime # System Uptime
- socket_summary # Socket summary
- filesystem
filesystem.ignore_types: [nfs, smbfs, autofs]
enabled: true
period: 30s
processes: ['.*']
hostfs: "/hostfs" Inside container the proccess running: | root | 925524 | 925442 | 3 | 13:13 | ? | 00:03:33 | metricbeat -system.hostfs=/hostfs -e --environment=container| If I look for the process on /hostfs/proc/ I can found it, but not if I look on /proc. The logs: {
"log.level":"error",
"@timestamp":"2022-06-23T18:40:38.175Z",
"log.origin":{
"file.name":"metrics/metrics.go",
"file.line":306
},
"message":"error determining cgroups version: error reading /proc/3788379/cgroup: open /proc/3788379/cgroup: no such file or directory",
"service.name":"metricbeat",
"ecs.version":"1.6.0"
} |
Discuss thread: https://discuss.elastic.co/t/metricbeat-k8s-error-getting-group-status-open-proc-pid-cgroup/255371
Since #21113 monitoring includes cgroup stats. However, those are hardcoded to read from
/proc
. This clashes with use cases where proc is available in a different location and causes the following error to be printed to the logs every 30s:Metricbeat's system module has a command-line argument,
-system.hostfs
, that allows to specify an alternate path to the root filesystem for these metrics.We could have a similar flag / configuration option for the monitoring metrics.
The text was updated successfully, but these errors were encountered: