Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_os_info does not update build_id #2981

Closed
jpds opened this issue Apr 2, 2024 · 9 comments · Fixed by #2987
Closed

node_os_info does not update build_id #2981

jpds opened this issue Apr 2, 2024 · 9 comments · Fixed by #2987

Comments

@jpds
Copy link
Contributor

jpds commented Apr 2, 2024

The collector responsible for node_os_info only appears to run a single time, I see the following metrics:

{__name__="node_os_info", build_id="23.11.20240320.f091af0", id="nixos", instance="...:9100", job="node", name="NixOS", pretty_name="NixOS 23.11 (Tapir)", version="23.11 (Tapir)", version_codename="tapir", version_id="23.11"}
{__name__="node_os_info", build_id="23.11.20240328.219951b", id="nixos", instance="...:9100", job="node", name="NixOS", pretty_name="NixOS 23.11 (Tapir)", version="23.11 (Tapir)", version_codename="tapir", version_id="23.11"}

Note that one of the build_id is from more than a week ago - however I've done several system builds on this box in that time - the newer build version only appeared after I had restarted node-exporter.

Host operating system: output of uname -a

Linux mango 6.6.22 #1-NixOS SMP PREEMPT_DYNAMIC Fri Mar 15 18:25:07 UTC 2024 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.7.0 (branch: unknown, revision: v1.7.0)
  build user:       nix@nixpkgs
  build date:       unknown
  go version:       go1.21.8
  platform:         linux/amd64
  tags:             unknown

node_exporter command line flags

/nix/store/vwpkipjynqgwpp2pgyl5mxxffysn1c60-node_exporter-1.7.0/bin/node_exporter --web.listen-address 0.0.0.0:9100

Are you running node_exporter in Docker?

No.

@SuperQ
Copy link
Member

SuperQ commented Apr 2, 2024

The information is updated at every scrape. If the data doesn't change then the underlying data has not changed. The os collector reads from os-release files on Linux.

This is likely a NixOS issue.

@jpds
Copy link
Contributor Author

jpds commented Apr 2, 2024

That's what I assumed, however node_exporter doesn't appear to follow the new symlink to the update os-release:

Pre-update:

jdavies@zima ~> ls -l /etc/os-release
lrwxrwxrwx 1 root root 22 Mar 20 15:54 /etc/os-release -> /etc/static/os-release
jdavies@zima ~> ls -l /etc/static/os-release
lrwxrwxrwx 2 root root 58 Jan  1  1970 /etc/static/os-release -> /nix/store/nz7p4wg27400l981d4czq2f6j82yn5d7-etc-os-release
jdavies@zima ~> cat /etc/os-release
BUG_REPORT_URL="https://github.com/NixOS/nixpkgs/issues"
BUILD_ID="23.11.20240319.fa9f817"
DOCUMENTATION_URL="https://nixos.org/learn.html"
HOME_URL="https://nixos.org/"
ID=nixos
LOGO="nix-snowflake"
NAME=NixOS
PRETTY_NAME="NixOS 23.11 (Tapir)"
SUPPORT_END="2024-06-30"
SUPPORT_URL="https://nixos.org/community.html"
VERSION="23.11 (Tapir)"
VERSION_CODENAME=tapir
VERSION_ID="23.11"
jdavies@zima /e/nixos > sudo nix flake update
warning: updating lock file '/etc/nixos/flake.lock':
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/fa9f817df522ac294016af3d40ccff82f5fd3a63' (2024-03-19)
  → 'github:NixOS/nixpkgs/219951b495fc2eac67b1456824cc1ec1fd2ee659' (2024-03-28)

jdavies@zima ~>  curl http://localhost:9100/metrics | grep os_info
# HELP node_os_info A metric with a constant '1' value labeled by build_id, id, id_like, image_id, image_version, name, pretty_name, variant, variant_id, version, version_codename, version_id.
# TYPE node_os_info gauge
node_os_info{build_id="23.11.20240319.fa9f817",id="nixos",id_like="",image_id="",image_version="",name="NixOS",pretty_name="NixOS 23.11 (Tapir)",variant="",variant_id="",version="23.11 (Tapir)",version_codename="tapir",version_id="23.11"} 1

Update system:

jdavies@zima ~> sudo nixos-rebuild switch
-> generates nixos-system-zima-23.11.20240328.219951b
jdavies@zima ~> cat /etc/os-release
BUG_REPORT_URL="https://github.com/NixOS/nixpkgs/issues"
BUILD_ID="23.11.20240328.219951b"
DOCUMENTATION_URL="https://nixos.org/learn.html"
HOME_URL="https://nixos.org/"
ID=nixos
LOGO="nix-snowflake"
NAME=NixOS
PRETTY_NAME="NixOS 23.11 (Tapir)"
SUPPORT_END="2024-06-30"
SUPPORT_URL="https://nixos.org/community.html"
VERSION="23.11 (Tapir)"
VERSION_CODENAME=tapir
VERSION_ID="23.11"
jdavies@zima ~> ls -l /etc/static/os-release
lrwxrwxrwx 2 root root 58 Jan  1  1970 /etc/static/os-release -> /nix/store/mc3razq0wglc6hzik2dw8vpsvmskpxc6-etc-os-release
jdavies@zima ~>  curl http://localhost:9100/metrics | grep os_info
# HELP node_os_info A metric with a constant '1' value labeled by build_id, id, id_like, image_id, image_version, name, pretty_name, variant, variant_id, version, version_codename, version_id.
# TYPE node_os_info gauge
node_os_info{build_id="23.11.20240319.fa9f817",id="nixos",id_like="",image_id="",image_version="",name="NixOS",pretty_name="NixOS 23.11 (Tapir)",variant="",variant_id="",version="23.11 (Tapir)",version_codename="tapir",version_id="23.11"} 1
jdavies@zima ~> sudo systemctl restart prometheus-node-exporter.service
jdavies@zima ~> curl http://localhost:9100/metrics | grep os_info
# HELP node_os_info A metric with a constant '1' value labeled by build_id, id, id_like, image_id, image_version, name, pretty_name, variant, variant_id, version, version_codename, version_id.
# TYPE node_os_info gauge
node_os_info{build_id="23.11.20240328.219951b",id="nixos",id_like="",image_id="",image_version="",name="NixOS",pretty_name="NixOS 23.11 (Tapir)",variant="",variant_id="",version="23.11 (Tapir)",version_codename="tapir",version_id="23.11"} 1

@SuperQ
Copy link
Member

SuperQ commented Apr 2, 2024

The code clearly shows that Update() is called and does an os.Open() at each scrape. Without knowing what kind of filesystem isolation is being done by NixOS or the systemd unit, it's impossible to say why you're having this issue.

@jpds
Copy link
Contributor Author

jpds commented Apr 2, 2024

~> cat /etc/systemd/system/prometheus-node-exporter.service
[Unit]
After=network.target

[Service]
Environment="LOCALE_ARCHIVE=/nix/store/d2nnadv7fdvains3rziq8lkpzw7anh9x-glibc-locales-2.38-44/lib/locale/locale-archive"
Environment="PATH=/nix/store/rk067yylvhyb7a360n8k1ps4lb4xsbl3-coreutils-9.3/bin:/nix/store/q7x6rjg6ya1gsg068fxj1sgf1k2n144n-findutils-4.9.0/bin:/nix/store/r1lp9kxlrc6h7vrba90gm6i94s31xvvx-gnugrep-3.11/bin:/nix/store/29w8hg0fis0pl3j4d3v0p02aicyw10lv-gnused-4.9/bin:/nix/store/dzp7d4k1d94s1x49p9171mvcsfyxr7bj-systemd-254.6/bin:/nix/store/rk067yylvhyb7a360n8k1ps4lb4xsbl3-coreutils-9.3/sbin:/nix/store/q7x6rjg6ya1gsg068fxj1sgf1k2n144n-findutils-4.9.0/sbin:/nix/store/r1lp9kxlrc6h7vrba90gm6i94s31xvvx-gnugrep-3.11/sbin:/nix/store/29w8hg0fis0pl3j4d3v0p02aicyw10lv-gnused-4.9/sbin:/nix/store/dzp7d4k1d94s1x49p9171mvcsfyxr7bj-systemd-254.6/sbin"
Environment="TZDIR=/nix/store/i6nk8llh46f2xjzc5h8j83kwwr1w3kx0-tzdata-2024a/share/zoneinfo"
CapabilityBoundingSet=
DeviceAllow=
DynamicUser=false
ExecStart=/nix/store/vwpkipjynqgwpp2pgyl5mxxffysn1c60-node_exporter-1.7.0/bin/node_exporter \
  --web.listen-address 0.0.0.0:9100

Group=node-exporter
LockPersonality=true
MemoryDenyWriteExecute=true
NoNewPrivileges=true
PrivateDevices=true
PrivateTmp=true
ProtectClock=false
ProtectControlGroups=true
ProtectHome=true
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectSystem=strict
RemoveIPC=true
Restart=always
RestrictAddressFamilies=AF_NETLINK
RestrictAddressFamilies=AF_INET
RestrictAddressFamilies=AF_INET6
RestrictNamespaces=true
RestrictRealtime=true
RestrictSUIDSGID=true
RuntimeDirectory=prometheus-node-exporter
SystemCallArchitectures=native
UMask=0077
User=node-exporter
WorkingDirectory=/tmp

There's also nothing in my logs from node-exporter.

@discordianfish
Copy link
Member

I don't think this can be anything on the node-exporter side, so closing for now.

jpds added a commit to jpds/node_exporter that referenced this issue Apr 15, 2024
jpds added a commit to jpds/node_exporter that referenced this issue Apr 15, 2024
@SuperQ SuperQ reopened this Apr 15, 2024
@SuperQ
Copy link
Member

SuperQ commented Apr 15, 2024

I see the issue now. I wasn't reading the code carefully enough. We cache the mtime of the file and only update the data if the file mtime is changed.

But, for some reason on NixOS the stat is not following the symlink. On Ubuntu, this doesn't seem to be a problem.

Although, maybe again this is a systemd masking issue as I don't see this problem running under a normal shell.

jpds added a commit to jpds/node_exporter that referenced this issue Apr 15, 2024
@jpds
Copy link
Contributor Author

jpds commented Apr 15, 2024

We cache the mtime of the file and only update the data if the file mtime is changed

This is where the NixOS weirdness comes in - where all files in the Nix store have a mtime of Unix epoch (as it doesn't support extended attributes).

node_exporter would look at the symlink to the Nix store, see the mtime hadn't changed cause they're all epoch - and then just carry on reporting the old value (rather than follow the new symlink to the updated store path).

@SuperQ
Copy link
Member

SuperQ commented Apr 15, 2024

Oh, that's a completely different problem. Maybe we need --collector.os.cache flag so that --no-collector.os.cache can be used on NixOS.

Or we could just completely eliminate the whole mtime thing and read the file every time. I kinda feel like the whole mtime check is a bit of an over-optimization considering most of the time the whole file will be in page cache anyway.

@discordianfish
Copy link
Member

@SuperQ yeah I think we should just read the file every time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants