-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(input.intel_pmt): Add pci_bdf tag to uniquely identify GPUs and other peripherals #14004
Conversation
…other peripherals In the current intel_pmt plugin, the `numa_node` of the intel_pmt device is used as a way to uniquely identify which CPU socket the sample belongs to. However, with GPUs and other future peripherals, an additional tag is needed to uniquely identify samples. Thus add the parent of intel_pmt's device PCI Bus:Device.Function (BDF) as the tag `pci_bdf`. We find the PCI BDF by traversing up from `telemDeviceSymlink` and taking it's basename. Equivalent example in bash: ``` $ basename $(realpath /sys/class/intel_pmt/telem10/device/..) 0000:e7:03.1 ``` All intel_pmt parent devices appear to be PCI devices, so we shouldn't need to special-case this tag between CPUs and other peripherals. Sample including pci_bdf: ``` > intel_pmt,datatype_idref=ttemperature,guid=0x9956f43f,host=node001,numa_node=0,pci_bdf=0000:e7:03.1,sample_group=TEMPERATURE[4],sample_name=CORE_TEMP value=36.4140625 1695671600000000000 ```
Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
Thanks again for another issue + PR, I'll ask @p-zak and the other intel folks for a review as well, but two questions:
Is there an example we can add as a test that includes a non-CPU device? Is the PCI Bus:Device.Function coarse enough for future devices as well? |
There doesn't appear to be anything that is unique about PMT devices for non-CPUs compared to CPUs, other than different GUIDs and thus referencing other XML entries. Generally this is the whole idea behind Intel PMT, providing a common interface to telemetry. The Intel folks might have better idea if there's addition test cases that'd provide better coverage.
I believe so yes. I've also not found other discoverable attributes about a intel_pmt or intel_vsec device that we could use as a tag to help uniquely identify components. |
Tagging @jakubsikorski for review as well in-case he hasn't seen this. Thanks Kuba! |
There is no difference whether the device is a CPU or not. Intel PMT telemetry will be visible for that device as a PCI device that has a BDF. Different devices - whether it's 2 CPUs in a 2 socket server or 2 GPUs - will have different PCI BDFs. The only difference between devices will be the amount of intel_pmt telemetry they can expose, but that is a difference on the GUID and XML spec level (and the plugin reads all that it can).
Yes, this should be enough. We will now have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@bensallen Thanks for PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bensallen for your contribution!
In the current intel_pmt plugin, the
numa_node
of the intel_pmt device is used as a way to uniquely identify which CPU socket the sample belongs to. However, with GPUs and other future peripherals, an additional tag is needed to uniquely identify samples.Thus add the parent of intel_pmt's device PCI Bus:Device.Function (BDF) as the tag
pci_bdf
. We find the PCI BDF by traversing up fromtelemDeviceSymlink
and taking it's basename. Equivalent example in bash:All intel_pmt parent devices appear to be PCI devices, so we shouldn't need to special-case this tag between CPUs and other peripherals.
Sample including pci_bdf:
Required for all PRs
resolves #14003