Disk input not reporting metrics for all mounted disk #1544

nyxcharon · 2016-07-25T17:29:48Z

Bug report

The disk plugin is not reporting metrics for a mounted disk, but the diskio plugin does.

Relevant telegraf.conf:

# Set Tag Configuration
[tags]
# Set Agent Configuration
[agent]
  interval = "10s"
  round_interval = true
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  debug = false
  quiet = true
  flush_buffer_when_full = true
  hostname = "hostname"
# Set output configuration
[[outputs.influxdb]]
  urls = ["http://<ip removed>:8086"]
  database = "telegraf"
  precision = "ns"
  timeout = "5s"

# Set Input Configuration
[[inputs.netstat]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.mem]]
[[inputs.cpu]]
  percpu = true
  totalcpu = true
[[inputs.disk]]
[[inputs.diskio]]
[[inputs.net]]
[[inputs.prometheus]]
  urls = ["<url 1>", "<url2>"]
  insecure_skip_verify = true
  bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"

System info:

Telegraf - version 1.0.0-beta2-18-g755b2ec
CoreOS stable (1010.6.0)
Docker version 1.10.3, build 8acee1b
Telegraf Docker container: quay.io/deis/telegraf:v2.1.0

Output of "mount" from inside the docker container (trimmed down to remove tmpfs mounts)
The disk of interest is /dev/xvdba.

proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,context="system_u:object_r:svirt_lxc_file_t:s0:c576,c784",gid=5,mode=620,ptmxmode=666)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime,seclabel)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime,seclabel)
/dev/xvda9 on /hostfs type ext4 (ro,relatime,seclabel,data=ordered)
/dev/xvda3 on /hostfs/usr type ext4 (ro,relatime,seclabel,block_validity,delalloc,barrier,user_xattr,acl)
/dev/xvda6 on /hostfs/usr/share/oem type ext4 (rw,nodev,relatime,seclabel,commit=600,data=ordered)
/dev/xvda1 on /hostfs/boot type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)
/dev/xvdba on /hostfs/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-west-2a/vol-bd4c4165 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/xvdba on /hostfs/var/lib/kubelet/pods/ac9be55d-4eb4-11e6-8baa-0a836f4f06a7/volumes/kubernetes.io~aws-ebs/grafanadata type ext4 (rw,relatime,seclabel,data=ordered)
/dev/xvda9 on /dev/termination-log type ext4 (rw,relatime,seclabel,data=ordered)
/dev/xvda9 on /etc/resolv.conf type ext4 (rw,relatime,seclabel,data=ordered)
/dev/xvda9 on /etc/hostname type ext4 (rw,relatime,seclabel,data=ordered)
/dev/xvda9 on /etc/hosts type ext4 (rw,relatime,seclabel,data=ordered)

The docker container is run with the following environment variables set (it's launched via kubernetes which is why this is yaml)

          - name: "INFLUXDB_URLS"
            value: http://<ip>:8086
          - name: "INFLUXDB_DATABASE"
            value: "telegraf"
          - name: "HOST_PROC"
            value: "/rootfs/proc"
          - name: "HOST_SYS"
            value: "/rootfs/sys"
          - name: "AGENT_QUIET"
            value: "true"
          - name: "ENABLE_PROMETHEUS"
            value: "true"
          - name: "HOST_MOUNT_PREFIX"
            value: "/hostfs"
          - name: "HOST_ETC"
            value: "/hostfs/etc"

Steps to reproduce:

Run docker container with the above environment variables set

Expected behavior:

To be able to query influxdb and see the disk stats on "/dev/xvdba on /hostfs/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-west-2a/vol-bd4c4165 type ext4 (rw,relatime,seclabel,data=ordered)"

Actual behavior:

No disk stats are available

Additional info:

I can query the diskio stats for this disk in telegraf. The query
select * from disk where time > now() - 30s and "host" = '<ip>' returns

2016-07-25T17:26:40Z    "<ip>"      116618  "xvdba" 12596224    7390    2365    "unknown"       3202019328  360804  356811

The text was updated successfully, but these errors were encountered:

j-vizcaino · 2016-11-15T15:33:17Z

The problem is that HOST_MOUNT_PREFIX does not work as expected.
According to the code, it is added as a prefix to the mount paths gathered by the ps package, expecting that the mount paths would be relative to the host root (not the container).

This is the problem: if you cat /hostfs/etc/mtab you can see all the mount points of the host, but relative to container mount point.
Example: if host mount point is /foo, ps.Partitions() will return /hostfs/foo in container.
Therefore, there is no need to add the HOST_MOUNT_PREFIX before issuing the os.Stat() call.

@nyxcharon Try removing the HOST_MOUNT_PREFIX and you will see all the mount points appear.

sparrc · 2016-11-15T15:38:44Z

thanks for tracking that down @j-vizcaino, is this just a documentation issue then? care to submit a PR?

j-vizcaino · 2016-11-15T16:06:09Z

Problem is in the code: it prepends HOST_MOUNT_PREFIX to the mount points gathered via ps.Partitions() whereas it should strip the prefix from the paths.
I will try to find some time to submit a PR to fix this.

j-vizcaino · 2016-11-15T16:42:01Z

Digging more, it seems the problem is a bit more complex.
The package used to query all partitions uses the etc/mtab file. In our case, this is /hostfs/etc/mtab which should be fine because it is the same file as the host.
Things start to get tricky when etc/mtab is a symlink. In CoreOS, /etc/mtab → ../proc/self/mounts. In Debian, /etc/mtab → /proc/mounts. In these cases, the parsed mtab file is the mounts of the container, not the host.

My previous comment can be discarded: mount points should appear in /foo (from the ps.Partitions() point of view) whereas the effective mount point inside the container is /hostfs/foo.

This should be addressed in https://github.com/shirou/gopsutil by opening (/hostfs)/proc/mounts directly (mtab is deprecated anyway) and everything should work.

j-vizcaino · 2016-11-16T15:22:43Z

This should be addressed in https://github.com/shirou/gopsutil by opening (/hostfs)/proc/mounts directly (mtab is deprecated anyway) and everything should work.

I was wrong again.
The following applies: /etc/mtab → /proc/mounts → /proc/self/mounts
Running tests on CoreOS 1068, within a Docker container having / bound to /hostfs, issuing cat /proc/mounts gives the exact same result as cat /hostfs/proc/mounts, that is, mount points appearing prefixed with /hostfs. The only way to effectively get the mount point of the host would be to cat /hostfs/proc/1/mounts but this is ugly.

m4ce · 2017-01-06T12:41:04Z

@j-vizcaino, did you find a solution for this? I tracked it down to the same issue you are describing, the problem is really mtab being a symlink to /proc/self/mounts :(

lrwxrwxrwx    1 root     root          17 Dec  1 04:32 /hostfs/etc/mtab -> /proc/self/mounts

m4ce · 2017-01-06T12:51:47Z

A workaround:

docker run --rm -v /:/hostfs:ro -e HOST_MOUNT_PREFIX=/hostfs -e HOST_ETC=/foo/etc -e HOST_PROC=/hostfs/proc -e HOST_SYS=/hostfs/sys -v /proc/1/mounts:/foo/etc/mtab -it telegraf:1.1.2 ..

HostEtc is only used in a handlful of places: https://github.com/shirou/gopsutil/search?utf8=%E2%9C%93&q=HostEtc

j-vizcaino · 2017-02-10T14:32:07Z

@m4ce Sorry for the late reply. In our current deployment of telegraf, we

bind mount host / → /hostfs in container
set env HOST_ETC=/rootfs/etc
set env HOST_SYS=/rootfs/sys
set env HOST_ETC=/rootfs/proc

HOST_MOUNT_PREFIX is not set in our case: this leads to mounts points being prefixed by /rootfs/ but this is the only working solution we could come to.

Hope this helps.

johnseekins · 2017-03-01T17:58:55Z

@j-vizcaino Trying to follow your example:

Container:

docker run -it --net=host --privileged=true -v /:/rootfs:ro -e "HOST_SYS=/rootfs/sys" -e "HOST_PROC=/rootfs/proc" -e "HOST_ETC=/rootfs/etc" <telegraf 1.2.1 image> sh

In the container:

/ # env
HOST_PROC=/rootfs/proc
HOSTNAME=...
SHLVL=1
OLDPWD=/config
HOME=/root
TERM=xterm
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOST_SYS=/rootfs/sys
HOST_ETC=/rootfs/etc
PWD=/

Then...trying to run telegraf:

/ # ./telegraf --config /config/telegraf.conf --test
* Plugin: inputs.diskio, Collection 1
...
* Plugin: inputs.kernel_vmstat, Collection 1
...
* Plugin: inputs.disk, Collection 1
2017-03-01T17:55:42Z E! error getting disk usage info: too many levels of symbolic links

What am I doing wrong here?

johnseekins · 2017-03-01T18:03:34Z

Ah. Running with "--privileged=true" breaks this behaviour.

j-vizcaino · 2017-03-01T19:52:51Z

@johnseekins The too many levels of symbolic links problems is usually caused by /proc/sys/fs/binfmt_misc, not being mounted when Telegraf is started. Depending on your host OS, you need to make sure this is enabled.

johnseekins · 2017-03-01T20:31:00Z

Strange how that was failing, and now that you've mentioned the needed mount, it is working...without my having mounted anything new...

Magic!

Anyway...it looks good to me. And that seems like a reasonable work-around...should the README be updated?

johnseekins · 2017-03-01T23:10:44Z

Another, related question...

We now get some stats about /etc/hosts, /etc/resolv.conf, /etc/hostname and such from within the container. These are all duplicates of each other, too. Any magic tricks to get rid of these duplicates?

mleonhard · 2020-03-12T22:01:26Z

With Telegraf 1.13.4, this is all you need to get inputs.disk to report on host filesystems:

Mount /proc at /hostfs/proc in the container
Set HOST_PROC=/hostfs/proc environment variable. This makes the gopsutil library read the host's proc instead of the container's proc.
Mount each filesystem mount point into the container, under /hostfs/. For example, if you want to monitor /mnt/volume1 then mount it into the container at /hostfs/mnt/volume1.
Set HOST_MOUNT_PREFIX=/hostfs to make Telegraf remove the /hostfs prefix from value of the path field it reports.

Working example:

root@staging19:~# df -h |grep /dev
udev            481M     0  481M   0% /dev
/dev/vda1        25G  2.6G   22G  11% /
tmpfs           493M     0  493M   0% /dev/shm
/dev/vda15      105M  3.6M  101M   4% /boot/efi
/dev/sdb        888M   21M  801M   3% /mnt/staginggrafana
/dev/sda        888M   31M  790M   4% /mnt/staginginfluxdb
root@staging19:~# docker run --tty --interactive --rm \
--volume /root/telegraf.conf:/etc/telegraf/telegraf.conf \
--volume /mnt:/hostfs/mnt \
--env HOST_MOUNT_PREFIX=/hostfs \
telegraf@sha256:490e2976a5890ae6474fe36cb44764c81f17215396647c0ddae09b04e47a30b6 2>&1
2020-03-12T21:47:07Z I! Starting Telegraf 1.13.4
2020-03-12T21:47:07Z I! Using config file: /etc/telegraf/telegraf.conf
2020-03-12T21:47:07Z I! Loaded inputs: disk
2020-03-12T21:47:07Z I! Loaded aggregators: 
2020-03-12T21:47:07Z I! Loaded processors: printer
2020-03-12T21:47:07Z I! Loaded outputs: discard
2020-03-12T21:47:07Z I! Tags enabled: host=13257ee4702d
2020-03-12T21:47:07Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"13257ee4702d", Flush Interval:10s
disk,host=13257ee4702d,path=/ used_percent=10.592189563618062 1584049630000000000
disk,host=13257ee4702d,path=/mnt/staginggrafana used_percent=2.49834601782969 1584049630000000000
disk,host=13257ee4702d,path=/mnt/staginginfluxdb used_percent=3.7496608741593245 1584049630000000000
^C2020-03-12T21:47:12Z I! [agent] Hang on, flushing any cached metrics before shutdown
root@staging19:~# cat telegraf.conf
[agent]
  interval = "10s"

[[outputs.discard]]

[[processors.printer]]

[[inputs.disk]]
  # Read metrics about disk usage by mount point
  # Example tags:
  #   disk
  #   device=sda1
  #   fstype=ext4
  #   host=dev
  #   mode=rw
  #   path=/
  # Example fields:
  #   inodes_total=3907584i
  #   inodes_free=3732435i
  #   inodes_used=175149i
  #   total=62725623808i
  #   free=51542761472i
  #   used=7966146560i
  #   used_percent=13.386477459334872
  #
  # By default stats will be gathered for all mount points.
  # Set mount_points will restrict the stats to only the specified mount points.
  # mount_points = ["/"]
  # Ignore mount points by filesystem type.
  # ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
  ignore_fs = ["tmpfs"]
  
  fieldpass = ["used_percent"]
  taginclude = ["host", "path"]
  [inputs.disk.tagpass]
    path = ["/", "/mnt/*"]
root@staging19:~#

KirannBhavaraju · 2021-10-28T11:08:46Z

@m4ce Sorry for the late reply. In our current deployment of telegraf, we

bind mount host / → /hostfs in container

set env HOST_ETC=/rootfs/etc

set env HOST_SYS=/rootfs/sys

set env HOST_ETC=/rootfs/proc

HOST_MOUNT_PREFIX is not set in our case: this leads to mounts points being prefixed by /rootfs/ but this is the only working solution we could come to.

Hope this helps.

I believe the last environment variable you are setting is HOST_PROC. For any future readers.

nyxcharon changed the title ~~Disk plugin not reporting metrics for all mounted disk~~ Disk input not reporting metrics for all mounted disk Jul 25, 2016

sparrc mentioned this issue Nov 8, 2016

Disk plugin does not show all mount points #2009

Closed

sparrc added this to the Future Milestone milestone Nov 8, 2016

sparrc added the bug unexpected problem or unintended behavior label Nov 8, 2016

sparrc modified the milestones: 1.3.0, Future Milestone Nov 15, 2016

m4ce mentioned this issue Jan 6, 2017

Telegraf in docker disk plugin issue #2234

Closed

m4ce mentioned this issue Jan 8, 2017

Disk input gives error which disappears on restart #1352

Closed

sparrc modified the milestones: Future Milestone, 1.3.0 Feb 9, 2017

sparrc added the help wanted Request for community participation, code, contribution label Feb 9, 2017

danielnelson mentioned this issue May 17, 2017

HOST_MOUNT_PREFIX not working as intended #2811

Closed

This was referenced Jun 13, 2017

LVM plugin #345

Closed

#2527 Allow collecting of host stats within docker containers by enab… #2924

Merged

danielnelson removed this from the Future Milestone milestone Jun 14, 2017

sbadia mentioned this issue Jul 10, 2017

system.disk plugin doesn't report infos about disk with no partitions #2992

Closed

danielnelson mentioned this issue Oct 13, 2017

System Input Plugin should have also variable for determinate path to /hostfs/var #3339

Closed

danielnelson mentioned this issue Nov 30, 2017

Docker host mount prefix #3529

Merged

3 tasks

danielnelson removed the help wanted Request for community participation, code, contribution label Nov 30, 2017

danielnelson added this to the 1.4.5 milestone Nov 30, 2017

danielnelson closed this as completed in #3529 Dec 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk input not reporting metrics for all mounted disk #1544

Disk input not reporting metrics for all mounted disk #1544

nyxcharon commented Jul 25, 2016

j-vizcaino commented Nov 15, 2016

sparrc commented Nov 15, 2016

j-vizcaino commented Nov 15, 2016

j-vizcaino commented Nov 15, 2016

j-vizcaino commented Nov 16, 2016

m4ce commented Jan 6, 2017 •

edited

Loading

m4ce commented Jan 6, 2017 •

edited

Loading

j-vizcaino commented Feb 10, 2017

johnseekins commented Mar 1, 2017

johnseekins commented Mar 1, 2017

j-vizcaino commented Mar 1, 2017

johnseekins commented Mar 1, 2017

johnseekins commented Mar 1, 2017

mleonhard commented Mar 12, 2020

KirannBhavaraju commented Oct 28, 2021

Disk input not reporting metrics for all mounted disk #1544

Disk input not reporting metrics for all mounted disk #1544

Comments

nyxcharon commented Jul 25, 2016

Bug report

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

j-vizcaino commented Nov 15, 2016

sparrc commented Nov 15, 2016

j-vizcaino commented Nov 15, 2016

j-vizcaino commented Nov 15, 2016

j-vizcaino commented Nov 16, 2016

m4ce commented Jan 6, 2017 • edited Loading

m4ce commented Jan 6, 2017 • edited Loading

j-vizcaino commented Feb 10, 2017

johnseekins commented Mar 1, 2017

johnseekins commented Mar 1, 2017

j-vizcaino commented Mar 1, 2017

johnseekins commented Mar 1, 2017

johnseekins commented Mar 1, 2017

mleonhard commented Mar 12, 2020

KirannBhavaraju commented Oct 28, 2021

m4ce commented Jan 6, 2017 •

edited

Loading

m4ce commented Jan 6, 2017 •

edited

Loading