Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(input.intel_pmt): Add pci_bdf tag to uniquely identify GPUs and other peripherals #14004

Merged
merged 1 commit into from
Oct 2, 2023

Conversation

bensallen
Copy link
Contributor

In the current intel_pmt plugin, the numa_node of the intel_pmt device is used as a way to uniquely identify which CPU socket the sample belongs to. However, with GPUs and other future peripherals, an additional tag is needed to uniquely identify samples.

Thus add the parent of intel_pmt's device PCI Bus:Device.Function (BDF) as the tag pci_bdf. We find the PCI BDF by traversing up from telemDeviceSymlink and taking it's basename. Equivalent example in bash:

$ basename $(realpath /sys/class/intel_pmt/telem10/device/..)
0000:e7:03.1

All intel_pmt parent devices appear to be PCI devices, so we shouldn't need to special-case this tag between CPUs and other peripherals.

Sample including pci_bdf:

> intel_pmt,datatype_idref=ttemperature,guid=0x9956f43f,host=node001,numa_node=0,pci_bdf=0000:e7:03.1,sample_group=TEMPERATURE[4],sample_name=CORE_TEMP value=36.4140625 1695671600000000000

Required for all PRs

resolves #14003

…other peripherals

In the current intel_pmt plugin, the `numa_node` of the intel_pmt device is used as a way to uniquely identify which CPU socket the sample belongs to. However, with GPUs and other future peripherals, an additional tag is needed to uniquely identify samples.

Thus add the parent of intel_pmt's device PCI Bus:Device.Function (BDF) as the tag `pci_bdf`. We find the PCI BDF by traversing up from `telemDeviceSymlink` and taking it's basename. Equivalent example in bash:

```
$ basename $(realpath /sys/class/intel_pmt/telem10/device/..)
0000:e7:03.1
```

All intel_pmt parent devices appear to be PCI devices, so we shouldn't need to special-case this tag between CPUs and other peripherals.

Sample including pci_bdf:

```
> intel_pmt,datatype_idref=ttemperature,guid=0x9956f43f,host=node001,numa_node=0,pci_bdf=0000:e7:03.1,sample_group=TEMPERATURE[4],sample_name=CORE_TEMP value=36.4140625 1695671600000000000
```
@telegraf-tiger telegraf-tiger bot added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Sep 26, 2023
@powersj powersj requested a review from p-zak September 26, 2023 17:59
@telegraf-tiger
Copy link
Contributor

Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

⚠️ This pull request increases the Telegraf binary size by 2.19 % for linux amd64 (new size: 202.0 MB, nightly size 197.7 MB)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB RPM TAR GZ ZIP
amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip
arm64.deb armel.rpm darwin_arm64.tar.gz windows_arm64.zip
armel.deb armv6hl.rpm freebsd_amd64.tar.gz windows_i386.zip
armhf.deb i386.rpm freebsd_armv7.tar.gz
i386.deb ppc64le.rpm freebsd_i386.tar.gz
mips.deb riscv64.rpm linux_amd64.tar.gz
mipsel.deb s390x.rpm linux_arm64.tar.gz
ppc64el.deb x86_64.rpm linux_armel.tar.gz
riscv64.deb linux_armhf.tar.gz
s390x.deb linux_i386.tar.gz
linux_mips.tar.gz
linux_mipsel.tar.gz
linux_ppc64le.tar.gz
linux_riscv64.tar.gz
linux_s390x.tar.gz

@powersj
Copy link
Contributor

powersj commented Sep 26, 2023

Thanks again for another issue + PR, I'll ask @p-zak and the other intel folks for a review as well, but two questions:

However, with GPUs and other future peripherals, an additional tag is needed to uniquely identify samples.

Is there an example we can add as a test that includes a non-CPU device?

Is the PCI Bus:Device.Function coarse enough for future devices as well?

@bensallen
Copy link
Contributor Author

Is there an example we can add as a test that includes a non-CPU device?

There doesn't appear to be anything that is unique about PMT devices for non-CPUs compared to CPUs, other than different GUIDs and thus referencing other XML entries. Generally this is the whole idea behind Intel PMT, providing a common interface to telemetry. The Intel folks might have better idea if there's addition test cases that'd provide better coverage.

Is the PCI Bus:Device.Function coarse enough for future devices as well?

I believe so yes. I've also not found other discoverable attributes about a intel_pmt or intel_vsec device that we could use as a tag to help uniquely identify components.

@bensallen
Copy link
Contributor Author

Tagging @jakubsikorski for review as well in-case he hasn't seen this. Thanks Kuba!

@jakubsikorski
Copy link
Contributor

Is there an example we can add as a test that includes a non-CPU device?

There doesn't appear to be anything that is unique about PMT devices for non-CPUs compared to CPUs, other than different GUIDs and thus referencing other XML entries. Generally this is the whole idea behind Intel PMT, providing a common interface to telemetry. The Intel folks might have better idea if there's addition test cases that'd provide better coverage.

There is no difference whether the device is a CPU or not. Intel PMT telemetry will be visible for that device as a PCI device that has a BDF. Different devices - whether it's 2 CPUs in a 2 socket server or 2 GPUs - will have different PCI BDFs. The only difference between devices will be the amount of intel_pmt telemetry they can expose, but that is a difference on the GUID and XML spec level (and the plugin reads all that it can).

Is the PCI Bus:Device.Function coarse enough for future devices as well?

I believe so yes. I've also not found other discoverable attributes about a intel_pmt or intel_vsec device that we could use as a tag to help uniquely identify components.

Yes, this should be enough. We will now have guid identifying unique intel pmt telemetry semantic space, numa_node to identify the socket and pci_bdf to identify the device exposing the telemetry.

Copy link
Collaborator

@p-zak p-zak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bensallen Thanks for PR!

@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Oct 2, 2023
@powersj powersj assigned srebhan and unassigned powersj Oct 2, 2023
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bensallen for your contribution!

@srebhan srebhan merged commit dd74499 into influxdata:master Oct 2, 2023
4 checks passed
@github-actions github-actions bot added this to the v1.29.0 milestone Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In input.intel_pmt add pci_bdf tag to uniquely identify GPUs and other peripherals
5 participants