Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] Add NVMe PCI address or name in sensors_temperatures keys #1902

Open
davromaniak opened this issue Jan 6, 2021 · 7 comments
Open

Comments

@davromaniak
Copy link

Summary

  • OS: Linux, tested on Kubuntu 20.10 with a 5.8 kernel
  • Type: scripts

Description

When calling psutil.sensors_temperatures()['nvme'], all NVMe drives of the system are in a single list, without any way of telling which line goes to which drive.

Here's an example from my own PC :

clavier@WHITE-WHALE:~$ python3
Python 3.8.6 (default, Sep 25 2020, 09:36:53) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import psutil, pprint
>>> psutil.version_info
(5, 8, 0)
>>> pprint.pp(psutil.sensors_temperatures()['nvme'])
[shwtemp(label='Composite', current=34.85, high=84.85, critical=84.85),
 shwtemp(label='Sensor 1', current=34.85, high=65261.85, critical=65261.85),
 shwtemp(label='Sensor 2', current=40.85, high=65261.85, critical=65261.85),
 shwtemp(label='Composite', current=36.85, high=74.85, critical=79.85)]

After checking with the sensors command, I can tell the first 3 lines are one drive (/dev/nvme0) and the last line is the second drive (/dev/nvme1).

After searching around in sysfs, I found psutil uses the name property, which is too generic in this case.

clavier@WHITE-WHALE:~$ cat /sys/class/hwmon/hwmon1/name 
nvme
clavier@WHITE-WHALE:~$ cat /sys/class/hwmon/hwmon2/name 
nvme

Maybe, we could use either the full PCI Address as another value in shwtemp object (like "pciaddress", only for NVMe drives)

The PCI address is at this path :

clavier@WHITE-WHALE:~$ cat /sys/class/hwmon/hwmon1/device/nvme/nvme0/address 
0000:02:00.0
clavier@WHITE-WHALE:~$ cat /sys/class/hwmon/hwmon2/device/nvme/nvme1/address 
0000:04:00.0

Psutil return would look like this :

>>> pprint.pp(psutil.sensors_temperatures()['nvme'])
[shwtemp(label='Composite', pciaddress='0000:02:00.0', current=34.85, high=84.85, critical=84.85),
 shwtemp(label='Sensor 1', pciaddress='0000:02:00.0', current=34.85, high=65261.85, critical=65261.85),
 shwtemp(label='Sensor 2', pciaddress='0000:02:00.0', current=40.85, high=65261.85, critical=65261.85),
 shwtemp(label='Composite', pciaddress='0000:04:00.0', current=36.85, high=74.85, critical=79.85)]

Or use the drive name (here "nvme0" and "nvme1" in the example above) as the key in psutil.sensors_temperatures()

Psutil return would look like this :

>>> pprint.pp(psutil.sensors_temperatures()['nvme0'])
[shwtemp(label='Composite', current=34.85, high=84.85, critical=84.85),
 shwtemp(label='Sensor 1', current=34.85, high=65261.85, critical=65261.85),
 shwtemp(label='Sensor 2', current=40.85, high=65261.85, critical=65261.85)]
>>> pprint.pp(psutil.sensors_temperatures()['nvme1'])
[shwtemp(label='Composite', current=36.85, high=74.85, critical=79.85)]

And maybe keep the "nvme" key with all drives, to avoid a breaking change.

I just want to avoid too much code modification, so I think it's the more lightweight solutions we have.

Don't hesitate to comment and discuss.

If we agree on this feature, I'm willing to contribute and send a PR.

Thanks.

@giampaolo
Copy link
Owner

What's the output of sensors command?

@davromaniak
Copy link
Author

Hi.

Here the full output (also containing wifi, CPU and acpitz)

clavier@WHITE-WHALE:~$ sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +34.0°C  

nvme-pci-0400
Adapter: PCI adapter
Composite:    +37.9°C  (low  =  -0.1°C, high = +74.8°C)
                       (crit = +79.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  (crit = +119.0°C)

ucsi_source_psy_0_00081-i2c-0-08
Adapter: NVIDIA GPU I2C adapter
in0:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
curr1:         0.00 A  (max =  +0.00 A)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +28.0°C  (high = +86.0°C, crit = +100.0°C)
Core 0:        +27.0°C  (high = +86.0°C, crit = +100.0°C)
Core 1:        +28.0°C  (high = +86.0°C, crit = +100.0°C)
Core 2:        +28.0°C  (high = +86.0°C, crit = +100.0°C)
Core 3:        +26.0°C  (high = +86.0°C, crit = +100.0°C)
Core 4:        +26.0°C  (high = +86.0°C, crit = +100.0°C)
Core 5:        +27.0°C  (high = +86.0°C, crit = +100.0°C)
Core 6:        +28.0°C  (high = +86.0°C, crit = +100.0°C)
Core 7:        +26.0°C  (high = +86.0°C, crit = +100.0°C)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +34.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +34.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +40.9°C  (low  = -273.1°C, high = +65261.8°C)

Thanks.

@giampaolo
Copy link
Owner

Also please the full output of pprint(psutil.sensors_temperatures()).

@davromaniak
Copy link
Author

Here it is :

>>> pprint.pp(psutil.sensors_temperatures())
{'acpitz': [shwtemp(label='', current=27.8, high=119.0, critical=119.0)],
 'nvme': [shwtemp(label='Composite', current=34.85, high=84.85, critical=84.85),
          shwtemp(label='Sensor 1', current=34.85, high=65261.85, critical=65261.85),
          shwtemp(label='Sensor 2', current=40.85, high=65261.85, critical=65261.85),
          shwtemp(label='Composite', current=37.85, high=74.85, critical=79.85)],
 'coretemp': [shwtemp(label='Package id 0', current=29.0, high=86.0, critical=100.0),
              shwtemp(label='Core 0', current=27.0, high=86.0, critical=100.0),
              shwtemp(label='Core 1', current=28.0, high=86.0, critical=100.0),
              shwtemp(label='Core 2', current=28.0, high=86.0, critical=100.0),
              shwtemp(label='Core 3', current=27.0, high=86.0, critical=100.0),
              shwtemp(label='Core 4', current=26.0, high=86.0, critical=100.0),
              shwtemp(label='Core 5', current=27.0, high=86.0, critical=100.0),
              shwtemp(label='Core 6', current=29.0, high=86.0, critical=100.0),
              shwtemp(label='Core 7', current=26.0, high=86.0, critical=100.0)],
 'iwlwifi_1': [shwtemp(label='', current=33.0, high=None, critical=None)]}

@giampaolo
Copy link
Owner

Mmm... what should change is the dictionary key (so in your case we'd have 2 separate nvme-* keys), but I'm not sure what logic to use. :-\
On my system I don't even have the .../address file.
The whole /sys/class/hwmon/hwmon* tree is an unreliable mess, really. The definitive solution would probably be to take a look at how sm-sensors does it (https://github.com/lm-sensors/lm-sensors/blob/master/lib/sysfs.c) and rewrite/refactor the whole thing.

@davromaniak
Copy link
Author

Hi.

Well, I'm rusted in C, so I'll have some difficulties helping to rewrite the script drawing inspiration from what they are doing on the lm-sensors project.

Thanks.

@Jik
Copy link

Jik commented Nov 19, 2024

I faced the same problem (although I at the moment have only one nvme device in the computer where I'm debugging, I'd still like to have a more specific device name), and ended up here.

So lm-sensors calls my nvme device nvme-pci-0500. I did some digging in lm-sensors, and it's basically resolving 2 or 3 syslinks and doing trial-and-error until it realizes it's a PCI device and then gets its address.

Essentially, in my system:

jik@hostname:/sys/class/hwmon$ ls -l
total 0
lrwxrwxrwx 1 root root 0 nov 18 21:57 hwmon0 -> ../../devices/virtual/thermal/thermal_zone0/hwmon0
lrwxrwxrwx 1 root root 0 nov 18 21:57 hwmon1 -> ../../devices/pci0000:00/0000:00:01.6/0000:05:00.0/nvme/nvme0/hwmon1
lrwxrwxrwx 1 root root 0 nov 18 21:57 hwmon2 -> ../../devices/pci0000:00/0000:00:18.3/hwmon/hwmon2
lrwxrwxrwx 1 root root 0 nov 18 21:57 hwmon3 -> ../../devices/platform/PNP0C14:00/wmi_bus/wmi_bus-PNP0C14:00/DEADBEEF-2001-0000-00A0-C90629100000/hwmon/hwmon3
lrwxrwxrwx 1 root root 0 nov 18 21:57 hwmon4 -> ../../devices/pci0000:00/0000:00:08.1/0000:06:00.0/hwmon/hwmon4

The process by lm-sensors to find the name is something like:

  1. Resolve /sys/class/hwmon/hwmon1/device to /sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0/nvme/nvme0
  2. Trial-and-error loop starting from my_dev_path=/sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0/nvme/nvme0:
    1. Resolve ${my_dev_path}/subsystem to ../../../../../../class/nvme
    2. Try whatever is after the last /. Realize that nvme is not a supported subsystem string
    3. Trial-and-error failed: Resolve ${my_dev_path}/device instead to /sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0
  3. Trial-and-error loop step starting from my_dev_path=/sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0
    1. Resolve ${my_dev_path}/subsystem to ../../../../bus/pci
    2. Try whatever is after the last /. It is pci which is supported.
    3. Trial-and-error succeeded: Sensor type is PCI and address is parsed from 0000:05:00.0 from the last part of my_dev_path,
  4. At the end, get the device name from the /name file, add "-pci-"+address at the end.

Source: added some printfs to understand the process in lm-sensors/lm-sensors@master...JiK:lm-sensors:debug-sysfs Full output:

sensors_add_hwmon_device: /sys/class/hwmon/hwmon1 hwmon1
Link path: /sys/class/hwmon/hwmon1/device
Real path: /sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0/nvme/nvme0
sensors_read_one_sysfs_chip: /sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0/nvme/nvme0 nvme0 /sys/class/hwmon/hwmon1
find_bus_type: /sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0/nvme/nvme0 nvme0
Link path: /sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0/nvme/nvme0/subsystem
Read link: ../../../../../../class/nvme
Subsys: nvme
Failed to find subsys.Link path: /sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0/nvme/nvme0/device
Real path: /sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0
New dev_name: 0000:05:00.0
Link path: /sys/devices/pci0000:00/0000:00:01.6/0000:05:00.0/subsystem
Read link: ../../../../bus/pci
Subsys: pci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants