Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector reporting incorrect values for node_filesystem_{size,avail}_bytes for root filesystem / root device volume #1906

Closed
robbie-demuth opened this issue Dec 7, 2020 · 4 comments

Comments

@robbie-demuth
Copy link

robbie-demuth commented Dec 7, 2020

Note: This seems similar to #1675, #1505, and #1339, but each of those issues is closed

Host operating system: output of uname -a

Linux <REDACTED> 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.0.1 (branch: HEAD, revision: 3715be6ae899f2a9b9dbfd9c39f3e09a7bd4559f)
  build user:       root@1f76dbbcfa55
  build date:       20200616-12:44:12
  go version:       go1.14.4

node_exporter command line flags

--path.procfs=/host/proc --path.sysfs=/host/sys --web.listen-address=$(HOST_IP):9102

Are you running node_exporter in Docker?

Yes. We're running node_exporter in Docker / Kubernetes using DaemonSets:

$ kubectl get ds prometheus-node-exporter -ojson | jq -r '.spec.template.spec'
{
  "containers": [
    {
      "args": [
        "--path.procfs=/host/proc",
        "--path.sysfs=/host/sys",
        "--web.listen-address=$(HOST_IP):9102"
      ],
      "env": [
        {
          "name": "HOST_IP",
          "value": "0.0.0.0"
        }
      ],
      "image": "quay.io/prometheus/node-exporter:v1.0.1",
      "name": "node-exporter",
      "ports": [
        {
          "containerPort": 9102,
          "hostPort": 9102,
          "name": "metrics",
          "protocol": "TCP"
        }
      ],
      "volumeMounts": [
        {
          "mountPath": "/host/proc",
          "name": "proc",
          "readOnly": true
        },
        {
          "mountPath": "/host/sys",
          "name": "sys",
          "readOnly": true
        }
      ]
    }
  ],
  "hostNetwork": true,
  "hostPID": true,
  "securityContext": {
    "fsGroup": 65534,
    "runAsGroup": 65534,
    "runAsNonRoot": true,
    "runAsUser": 65534
  },
  "volumes": [
    {
      "hostPath": {
        "path": "/proc",
        "type": ""
      },
      "name": "proc"
    },
    {
      "hostPath": {
        "path": "/sys",
        "type": ""
      },
      "name": "sys"
    }
  ]
}

What did you do that produced an error?

N/A

What did you expect to see?

We run node_exporter in Docker/Kubernetes on AWS using AWS EKS. Each of our worker nodes has a root device volume (/dev/sda1) that is 20 GiB large. This device backs the nvme0n1 / /dev/nvme0n1p1 file system:

$ ls -l /dev/sda1
lrwxrwxrwx 1 root root 7 Dec  4 01:36 /dev/sda1 -> nvme0n1

As expected, df shows that the device / filesystem is indeed 20 GiB large:

$ df -hT /
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/nvme0n1p1 xfs    20G  9.6G   11G  48% /

Given this info, I'd expect node_filesystem_size_bytes and node_filesystem_avail_bytes for both {device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} and {device="rootfs",fstype="rootfs",mountpoint="/"} to be 20 GiB and 11 GiB, respectively.

What did you see instead?

The values of the gauges are instead 5.3658783744e+10 and 3.6411342848e+10:

# HELP node_filesystem_avail_bytes Filesystem space available to non-root users in bytes.
# TYPE node_filesystem_avail_bytes gauge
node_filesystem_avail_bytes{device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} 3.6411342848e+10
node_filesystem_avail_bytes{device="rootfs",fstype="rootfs",mountpoint="/"} 3.6411342848e+10
# HELP node_filesystem_device_error Whether an error occurred while getting statistics for the given device.
# TYPE node_filesystem_device_error gauge
node_filesystem_device_error{device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} 0
node_filesystem_device_error{device="rootfs",fstype="rootfs",mountpoint="/"} 0
# HELP node_filesystem_size_bytes Filesystem size in bytes.
# TYPE node_filesystem_size_bytes gauge
node_filesystem_size_bytes{device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} 5.3658783744e+10
node_filesystem_size_bytes{device="rootfs",fstype="rootfs",mountpoint="/"} 5.3658783744e+10

Miscellaneous

I should also note that our worker nodes have several other block device mappings - one of which is a 50 GiB volume. We do a bit of funky setup on the underlying nodes, but I believe the values being reported match those of the /dev/mapper/crypt1 filesystem mounted on /vol/crypt1:

$ df -hT | grep -vE '/var/lib/(docker|kubelet)'
Filesystem           Type            Size  Used Avail Use% Mounted on
devtmpfs             devtmpfs        7.6G     0  7.6G   0% /dev
tmpfs                tmpfs           7.7G     0  7.7G   0% /dev/shm
tmpfs                tmpfs           7.7G  794M  6.9G  11% /run
tmpfs                tmpfs           7.7G     0  7.7G   0% /sys/fs/cgroup
/dev/nvme0n1p1       xfs              20G  9.6G   11G  48% /
/dev/mapper/tmp      xfs              10G  6.1G  3.9G  61% /tmp
/dev/mapper/crypt1   xfs              50G   17G   34G  33% /vol/crypt1
/dev/mapper/cryptfs  xfs              50G  5.0G   45G  10% /usr/local/appian
10.34.85.251:/appian fuse.glusterfs   50G  5.5G   45G  11% /usr/local/appian/data/mirrored-data
tmpfs                tmpfs           1.6G     0  1.6G   0% /run/user/0

I dug through node_exporter's source code a bit and found that it uses golang.org/x/sys/unix's Statfs function (I think?) to collect the values for both metrics. I haven't yet dug through the function's source code, but could that possibly be the culprit?

Also, I see that there's also an XFS-specific collector that reports metrics XFS runtime stats. I'm not quite sure what each of those metrics represents, but is it possible to use them to determine the size of and amount of space on the /dev/nvme0n1p1 XFS device?

@robbie-demuth
Copy link
Author

I enabled debug logging and pulled the logs

node_exporter.log

@discordianfish
Copy link
Member

Try running it with flags as documented here: https://github.com/prometheus/node_exporter#using-docker
You're probably seeing the container filesystem being reported as root

@robbie-demuth
Copy link
Author

That seems to have done the trick! I can't believe I missed that. I thought I had accounted for mounting the host filesystem by mounting /proc and /sys to /host/proc and /host/sys, respectively, but supposedly not. Thanks for the fast feedback!

@robbie-demuth
Copy link
Author

FWIW, it also looks like there has been an update to the Helm chart to address this

prometheus-community/helm-charts#80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants