Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_cpu_core_throttles_total ignores second thread of hyperthreading systems #1472

Closed
pgier opened this issue Sep 4, 2019 · 4 comments
Closed

Comments

@pgier
Copy link
Contributor

pgier commented Sep 4, 2019

This issue is basically the reverse of #659. On my laptop I noticed that two threads on the same core can have different values for node_cpu_core_throttles_total. This is low priority since it probably doesn't hurt anything and fixing it would add metrics that are probably not all that useful.

Host operating system: output of uname -a

Linux pgier-laptop 5.2.9-200.fc30.x86_64 #1 SMP Fri Aug 16 21:37:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 0.18.1 (branch: disable-default-collectors, revision: 93c12e0)
build user: pgier@pgier-laptop
build date: 20190903-15:50:14
go version: go1.12.4

node_exporter command line flags

none

Are you running node_exporter in Docker?

no

What did you do that produced an error?

ran node_exporter with default settings on a system with hyperthreading enabled

$ for i in /sys/devices/system/cpu/cpu{0..7}/thermal_throttle/core_throttle_count; do echo "$i : $(cat $i)"; done
/sys/devices/system/cpu/cpu0/thermal_throttle/core_throttle_count : 7343
/sys/devices/system/cpu/cpu1/thermal_throttle/core_throttle_count : 432
/sys/devices/system/cpu/cpu2/thermal_throttle/core_throttle_count : 15732
/sys/devices/system/cpu/cpu3/thermal_throttle/core_throttle_count : 874
/sys/devices/system/cpu/cpu4/thermal_throttle/core_throttle_count : 7334
/sys/devices/system/cpu/cpu5/thermal_throttle/core_throttle_count : 432
/sys/devices/system/cpu/cpu6/thermal_throttle/core_throttle_count : 15732
/sys/devices/system/cpu/cpu7/thermal_throttle/core_throttle_count : 874

You can see that cpu0 and cpu4 have slightly different values for core_throttle_count. I'm guessing this only occurs on the first core because it's related to low power mode.

What did you expect to see?

a value of node_cpu_core_throttles_total for each virtual processor

What did you see instead?

a single value per physical core

pgier added a commit to pgier/node_exporter that referenced this issue Sep 10, 2019
It's possible for two cpus in the same core to have a different
value for the core_throttle_count, so this change reports a
cpu_core_throttles metric for each cpu and not just each core.
Hyperthreading systems have two cpus per core.

Fixes prometheus#1472

Signed-off-by: Paul Gier <[email protected]>
@adelton
Copy link

adelton commented Oct 9, 2019

Is the value different and does it stay different? Can't the difference be explained by the values in /proc being read at different times, and the value changing between the respective readings?

@pgier
Copy link
Contributor Author

pgier commented Oct 9, 2019

The value stays different, although the size of the difference stays somewhat consistent over time. And there is always only one core with two non-matching cpus, although it can be a different core between restarts. Checking the current values, I see this:

/sys/devices/system/cpu/cpu0/thermal_throttle/core_throttle_count : 20610
/sys/devices/system/cpu/cpu1/thermal_throttle/core_throttle_count : 893
/sys/devices/system/cpu/cpu2/thermal_throttle/core_throttle_count : 24785
/sys/devices/system/cpu/cpu3/thermal_throttle/core_throttle_count : 891
/sys/devices/system/cpu/cpu4/thermal_throttle/core_throttle_count : 20610
/sys/devices/system/cpu/cpu5/thermal_throttle/core_throttle_count : 893
/sys/devices/system/cpu/cpu6/thermal_throttle/core_throttle_count : 24786
/sys/devices/system/cpu/cpu7/thermal_throttle/core_throttle_count : 891

Currently cpu 2 and 6 don't match (they are off by only one). My wild guess as to why this happens is that the system turns off hyperthreading in low power mode, so only a single cpu is running at that time, and so it's value increases separately from its twin.

@pgier
Copy link
Contributor Author

pgier commented Oct 10, 2019

Closing this one for now, and we can look into this again later if the current implementation causes an issue for anyone.

@pgier pgier closed this as completed Oct 10, 2019
@pgier
Copy link
Contributor Author

pgier commented Oct 10, 2019

Also, just wanted to note for future reference that there always seems to be a single CPU pair that doesn't match, however the numbers/ids of the CPU pair changes between reboots, and the difference between the two core_throttle_counts in the pair seems to increase over time until the next reboot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants