inputs.disk missing a disk in docker #10422

JoshKeegan · 2022-01-11T10:00:48Z

Relevent telegraf.conf

# Read metrics about disk usage by mount point
[[inputs.disk]]
  ## By default stats will be gathered for all mount points.
  ## Set mount_points will restrict the stats to only the specified mount points.
  # mount_points = ["/"]

  ## Ignore mount points by filesystem type.
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

Logs from Telegraf

2022-01-11T09:35:46Z I! Starting Telegraf 1.21.2
2022-01-11T09:35:46Z I! Using config file: /etc/telegraf/telegraf.conf
2022-01-11T09:35:46Z I! Loaded inputs: cpu disk diskio docker http kernel mem ping processes sensors smart swap system teamspeak
2022-01-11T09:35:46Z I! Loaded aggregators:
2022-01-11T09:35:46Z I! Loaded processors:
2022-01-11T09:35:46Z I! Loaded outputs: influxdb
2022-01-11T09:35:46Z I! Tags enabled:
2022-01-11T09:35:46Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:10s

System info

Telegraf v1.12.2 (latest)

Docker

docker-compose.yaml extract:

telegraf:
    build: ./images/telegraf
    privileged: true
    volumes:
      - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro
      - /proc:/host/proc:ro
      # Mount the HDD so it can access it for stats
      - /home/josh/data:/disks/hdd:ro
      # utmp from host required for system n_users count
      - /var/run/utmp:/var/run/utmp:ro
      - /dev:/dev:ro
      - /var/run/docker.sock:/var/run/docker.sock
    user: telegraf:${DOCKER_GID}
    environment:
      - HOST_PROC=/host/proc
    networks:
      - influxdb
    restart: unless-stopped

Steps to reproduce

Have a system with multiple drives
Run telegraf in docker with inputs.disk set up, mounting in any additional disks (in my case that is /home/josh/data => /disks/hdd)
...

Expected behavior

Disk visible in container (can docker exec into it & use df to check this). Disk should also be being picked up by the telegraf disk input and stats recorded.

Actual behavior

Disk visible in container (can docker exec into it & use df to check this). Disk is not being picked up by the telegraf disk input.
e.g. here is a partition usage graph using the data from telegraf and the disk dissapears after telegraf is upgraded from 1.20.x => 1.21.x:

Additional info

Same config was working in telegraf 1.20.x but the disk goes missing after upgrading to 1.21.x. I originally thought it could be related to #10297 but I see that is fixed in the last release and I still have the issue. I've downgraded to v1.20.4 for now as a workaround.

The text was updated successfully, but these errors were encountered:

srebhan · 2022-01-11T10:15:15Z

@JoshKeegan are you really running 1.21.2? I thought this is fixed with PR #10318... Can you please start telegraf with --debug --once and send me the output of the disk plugin?

JoshKeegan · 2022-01-11T14:17:40Z

Hi @srebhan,
Yes it's definitely v1.21.2. If I exec into the container, and run telegraf --version I get this: Telegraf 1.21.2 (git: HEAD 30d981d3).

The inputs.disk output of running with --debug --once is:

2022-01-11T14:10:28Z D! [inputs.disk] [SystemPS] partition 19: {"device":"/dev/sdb1","mountpoint":"/home/josh/data","fstype":"ext4","opts":["rw","relatime"]}
2022-01-11T14:10:28Z D! [inputs.disk] [SystemPS] -> using mountpoint "/home/josh/data"...
2022-01-11T14:10:28Z D! [inputs.disk] [SystemPS] => dropped by disk usage ("/home/josh/data"): no such file or directory

Now that I have read the above logs, it looks like it is reading the mount points from the host, not from the container. So as a workaround I have changed the drive to be mounted at the same path within the container (/home/josh/data) and telegraf now picks it up. Hopefully this can still be fixed in a future release though as it may not always be possible to have the mount points match.

Edit: I trimmed the log down to just the relevant part as it's huge, let me know if you need the full output.

srebhan · 2022-01-24T08:35:50Z

@JoshKeegan sorry for my late response, but I've been busy quite a bit.

I'm a bit puzzled on why this happens, but I guess it's due to HOST_PROC being set. This way, the telegraf container will try to get the disks/partitions from the host. Can you try to unset HOST_PROC and see if it fixes the issue for you?

JoshKeegan · 2022-01-24T11:25:58Z

No worries, thanks for looking into it!

I've tried it and unsetting HOST_PROC does fix this issue. However it needs to be set for the processes input plugin to work, otherwise that will only see process information from within the container.

srebhan · 2022-01-24T21:14:42Z

That's... unfortunate. :-) I'm not sure on how to fix this as we cannot set and unset that environment variable at the same time in one process. Nor can we pass these variables as parameters to gopsutil (the underlying lib we use). To be honest I don't see a way to monitor the disks inside the container but the processes outside of the container without running two instances. @powersj or @sspaink any other idea?

srebhan · 2022-05-17T12:42:17Z

Hey @JoshKeegan,

can you try the artifacts built in #11107 with the following docker-compose.yaml

telegraf:
    build: ./images/telegraf
    privileged: true
    volumes:
      - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro
      - /proc:/host/proc:ro
      # Mount the HDD so it can access it for stats
      - /home/josh/data:/disks/hdd:ro
      # utmp from host required for system n_users count
      - /var/run/utmp:/var/run/utmp:ro
      - /dev:/dev:ro
      - /var/run/docker.sock:/var/run/docker.sock
    user: telegraf:${DOCKER_GID}
    environment:
      - HOST_PROC=/host/proc
      - HOST_PROC_MOUNTINFO=/proc/1
    networks:
      - influxdb
    restart: unless-stopped

Please not the HOST_PROC_MOUNTINFO which overwrites HOST_PROC for getting the partition information from within the container.

JoshKeegan · 2022-05-25T16:16:41Z

Thanks @srebhan that's fixed it 👍

JoshKeegan added the bug unexpected problem or unintended behavior label Jan 11, 2022

srebhan self-assigned this Jan 11, 2022

srebhan added the area/system label Jan 11, 2022

srebhan mentioned this issue May 17, 2022

fix: Bump gopsutil from v3.22.3 to v3.22.4 #11107

Merged

3 tasks

powersj closed this as completed in #11107 May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inputs.disk missing a disk in docker #10422

inputs.disk missing a disk in docker #10422

JoshKeegan commented Jan 11, 2022

srebhan commented Jan 11, 2022

JoshKeegan commented Jan 11, 2022 •

edited

Loading

srebhan commented Jan 24, 2022 •

edited

Loading

JoshKeegan commented Jan 24, 2022

srebhan commented Jan 24, 2022 •

edited

Loading

srebhan commented May 17, 2022 •

edited

Loading

JoshKeegan commented May 25, 2022

inputs.disk missing a disk in docker #10422

inputs.disk missing a disk in docker #10422

Comments

JoshKeegan commented Jan 11, 2022

Relevent telegraf.conf

Logs from Telegraf

System info

Docker

Steps to reproduce

Expected behavior

Actual behavior

Additional info

srebhan commented Jan 11, 2022

JoshKeegan commented Jan 11, 2022 • edited Loading

srebhan commented Jan 24, 2022 • edited Loading

JoshKeegan commented Jan 24, 2022

srebhan commented Jan 24, 2022 • edited Loading

srebhan commented May 17, 2022 • edited Loading

JoshKeegan commented May 25, 2022

JoshKeegan commented Jan 11, 2022 •

edited

Loading

srebhan commented Jan 24, 2022 •

edited

Loading

srebhan commented Jan 24, 2022 •

edited

Loading

srebhan commented May 17, 2022 •

edited

Loading