Collector reporting incorrect values for node_filesystem_{size,avail}_bytes for root filesystem / root device volume #1906

robbie-demuth · 2020-12-07T16:32:20Z

Note: This seems similar to #1675, #1505, and #1339, but each of those issues is closed

Host operating system: output of `uname -a`

Linux <REDACTED> 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of `node_exporter --version`

node_exporter, version 1.0.1 (branch: HEAD, revision: 3715be6ae899f2a9b9dbfd9c39f3e09a7bd4559f)
  build user:       root@1f76dbbcfa55
  build date:       20200616-12:44:12
  go version:       go1.14.4

node_exporter command line flags

--path.procfs=/host/proc --path.sysfs=/host/sys --web.listen-address=$(HOST_IP):9102

Are you running node_exporter in Docker?

Yes. We're running node_exporter in Docker / Kubernetes using DaemonSets:

$ kubectl get ds prometheus-node-exporter -ojson | jq -r '.spec.template.spec'
{
  "containers": [
    {
      "args": [
        "--path.procfs=/host/proc",
        "--path.sysfs=/host/sys",
        "--web.listen-address=$(HOST_IP):9102"
      ],
      "env": [
        {
          "name": "HOST_IP",
          "value": "0.0.0.0"
        }
      ],
      "image": "quay.io/prometheus/node-exporter:v1.0.1",
      "name": "node-exporter",
      "ports": [
        {
          "containerPort": 9102,
          "hostPort": 9102,
          "name": "metrics",
          "protocol": "TCP"
        }
      ],
      "volumeMounts": [
        {
          "mountPath": "/host/proc",
          "name": "proc",
          "readOnly": true
        },
        {
          "mountPath": "/host/sys",
          "name": "sys",
          "readOnly": true
        }
      ]
    }
  ],
  "hostNetwork": true,
  "hostPID": true,
  "securityContext": {
    "fsGroup": 65534,
    "runAsGroup": 65534,
    "runAsNonRoot": true,
    "runAsUser": 65534
  },
  "volumes": [
    {
      "hostPath": {
        "path": "/proc",
        "type": ""
      },
      "name": "proc"
    },
    {
      "hostPath": {
        "path": "/sys",
        "type": ""
      },
      "name": "sys"
    }
  ]
}

What did you do that produced an error?

N/A

What did you expect to see?

We run node_exporter in Docker/Kubernetes on AWS using AWS EKS. Each of our worker nodes has a root device volume (/dev/sda1) that is 20 GiB large. This device backs the nvme0n1 / /dev/nvme0n1p1 file system:

$ ls -l /dev/sda1
lrwxrwxrwx 1 root root 7 Dec  4 01:36 /dev/sda1 -> nvme0n1

As expected, df shows that the device / filesystem is indeed 20 GiB large:

$ df -hT /
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/nvme0n1p1 xfs    20G  9.6G   11G  48% /

Given this info, I'd expect node_filesystem_size_bytes and node_filesystem_avail_bytes for both {device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} and {device="rootfs",fstype="rootfs",mountpoint="/"} to be 20 GiB and 11 GiB, respectively.

What did you see instead?

The values of the gauges are instead 5.3658783744e+10 and 3.6411342848e+10:

# HELP node_filesystem_avail_bytes Filesystem space available to non-root users in bytes.
# TYPE node_filesystem_avail_bytes gauge
node_filesystem_avail_bytes{device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} 3.6411342848e+10
node_filesystem_avail_bytes{device="rootfs",fstype="rootfs",mountpoint="/"} 3.6411342848e+10
# HELP node_filesystem_device_error Whether an error occurred while getting statistics for the given device.
# TYPE node_filesystem_device_error gauge
node_filesystem_device_error{device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} 0
node_filesystem_device_error{device="rootfs",fstype="rootfs",mountpoint="/"} 0
# HELP node_filesystem_size_bytes Filesystem size in bytes.
# TYPE node_filesystem_size_bytes gauge
node_filesystem_size_bytes{device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} 5.3658783744e+10
node_filesystem_size_bytes{device="rootfs",fstype="rootfs",mountpoint="/"} 5.3658783744e+10

Miscellaneous

I should also note that our worker nodes have several other block device mappings - one of which is a 50 GiB volume. We do a bit of funky setup on the underlying nodes, but I believe the values being reported match those of the /dev/mapper/crypt1 filesystem mounted on /vol/crypt1:

$ df -hT | grep -vE '/var/lib/(docker|kubelet)'
Filesystem           Type            Size  Used Avail Use% Mounted on
devtmpfs             devtmpfs        7.6G     0  7.6G   0% /dev
tmpfs                tmpfs           7.7G     0  7.7G   0% /dev/shm
tmpfs                tmpfs           7.7G  794M  6.9G  11% /run
tmpfs                tmpfs           7.7G     0  7.7G   0% /sys/fs/cgroup
/dev/nvme0n1p1       xfs              20G  9.6G   11G  48% /
/dev/mapper/tmp      xfs              10G  6.1G  3.9G  61% /tmp
/dev/mapper/crypt1   xfs              50G   17G   34G  33% /vol/crypt1
/dev/mapper/cryptfs  xfs              50G  5.0G   45G  10% /usr/local/appian
10.34.85.251:/appian fuse.glusterfs   50G  5.5G   45G  11% /usr/local/appian/data/mirrored-data
tmpfs                tmpfs           1.6G     0  1.6G   0% /run/user/0

I dug through node_exporter's source code a bit and found that it uses golang.org/x/sys/unix's Statfs function (I think?) to collect the values for both metrics. I haven't yet dug through the function's source code, but could that possibly be the culprit?

Also, I see that there's also an XFS-specific collector that reports metrics XFS runtime stats. I'm not quite sure what each of those metrics represents, but is it possible to use them to determine the size of and amount of space on the /dev/nvme0n1p1 XFS device?

The text was updated successfully, but these errors were encountered:

robbie-demuth · 2020-12-07T17:00:30Z

I enabled debug logging and pulled the logs

node_exporter.log

discordianfish · 2020-12-08T09:34:18Z

Try running it with flags as documented here: https://github.com/prometheus/node_exporter#using-docker
You're probably seeing the container filesystem being reported as root

robbie-demuth · 2020-12-08T13:25:25Z

That seems to have done the trick! I can't believe I missed that. I thought I had accounted for mounting the host filesystem by mounting /proc and /sys to /host/proc and /host/sys, respectively, but supposedly not. Thanks for the fast feedback!

robbie-demuth · 2020-12-08T13:31:43Z

FWIW, it also looks like there has been an update to the Helm chart to address this

prometheus-community/helm-charts#80

robbie-demuth closed this as completed Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collector reporting incorrect values for node_filesystem_{size,avail}_bytes for root filesystem / root device volume #1906

Collector reporting incorrect values for node_filesystem_{size,avail}_bytes for root filesystem / root device volume #1906

robbie-demuth commented Dec 7, 2020 •

edited

Loading

robbie-demuth commented Dec 7, 2020

discordianfish commented Dec 8, 2020

robbie-demuth commented Dec 8, 2020

robbie-demuth commented Dec 8, 2020

Collector reporting incorrect values for node_filesystem_{size,avail}_bytes for root filesystem / root device volume #1906

Collector reporting incorrect values for node_filesystem_{size,avail}_bytes for root filesystem / root device volume #1906

Comments

robbie-demuth commented Dec 7, 2020 • edited Loading

Host operating system: output of uname -a

node_exporter version: output of node_exporter --version

node_exporter command line flags

Are you running node_exporter in Docker?

What did you do that produced an error?

What did you expect to see?

What did you see instead?

Miscellaneous

robbie-demuth commented Dec 7, 2020

discordianfish commented Dec 8, 2020

robbie-demuth commented Dec 8, 2020

robbie-demuth commented Dec 8, 2020

robbie-demuth commented Dec 7, 2020 •

edited

Loading

Host operating system: output of `uname -a`

node_exporter version: output of `node_exporter --version`