Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report core throttles for each CPU #1479

Closed
wants to merge 1 commit into from

Conversation

pgier
Copy link
Contributor

@pgier pgier commented Sep 10, 2019

It's possible for two cpus in the same core to have a different
value for the core_throttle_count, so this change reports a
cpu_core_throttles metric for each cpu and not just each core.
Hyperthreading systems have two cpus per core.

Fixes #1472

@pgier
Copy link
Contributor Author

pgier commented Sep 10, 2019

See also #659
cc @rtreffer @knweiss

It's possible for two cpus in the same core to have a different
value for the core_throttle_count, so this change reports a
cpu_core_throttles metric for each cpu and not just each core.
Hyperthreading systems have two cpus per core.

Fixes prometheus#1472

Signed-off-by: Paul Gier <[email protected]>
@rtreffer
Copy link
Contributor

Interesting, I would really like to know what lscpu says... And if this might be related to reading the files slightly off.

@pgier
Copy link
Contributor Author

pgier commented Sep 16, 2019

Here's the output of lscpu on my thinkpad T580:

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       39 bits physical, 48 bits virtual
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               142
Model name:          Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
Stepping:            10
CPU MHz:             800.189
CPU max MHz:         4200.0000
CPU min MHz:         400.0000
BogoMIPS:            4224.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe sy
scall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmu
lqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave av
x f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fs
gsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida ar
at pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d

@adelton
Copy link

adelton commented Oct 9, 2019

Looking at the https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/mce/therm_throt.c where the core_throttle is defined and handled, the events are either on the PACKAGE_LEVEL or CORE_LEVEL. So it seems suspicious to create the node_cpu_core_throttles_total on the cpu level. Yes, /sys/devices/system/cpu/cpu* has it there per (logical) cpu but is that really an indication that the value has any meaning on the cpu?

Wouldn't /sys/devices/system/cpu/cpu*/topology/core_id be a better source of the hyperthreading information?

@pgier
Copy link
Contributor Author

pgier commented Oct 9, 2019

@adelton thanks for looking into this. Yeah, it seems weird that they would be different for the same core. But I'm ok with closing this if it's not useful to anyone else, since the differences I'm seeing are small and probably not that useful for metrics.

For hyperthreading info, #1489 does the trick for me. I found /proc/cpuinfo a bit easier to deal with than the files in /sys because I didn't have to interpret topology/core_siblings_list and thread_siblings_list.

@adelton
Copy link

adelton commented Oct 9, 2019

Agreed. If we can use #1489 for the hyper-threading support and take this pull request out of the picture, it would certainly simplify things.

@pgier
Copy link
Contributor Author

pgier commented Oct 10, 2019

Closing this one as won't fix, since I think in general the current implementation is fine. And the original use case to get cpu topology information is better handled by #1489.

@pgier pgier closed this Oct 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

node_cpu_core_throttles_total ignores second thread of hyperthreading systems
4 participants