Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong number of GPUs returned for num_gpus grain #15547

Closed
chrish42 opened this issue Sep 5, 2014 · 2 comments
Closed

Wrong number of GPUs returned for num_gpus grain #15547

chrish42 opened this issue Sep 5, 2014 · 2 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix
Milestone

Comments

@chrish42
Copy link

chrish42 commented Sep 5, 2014

I'm using Salt 2014.1.10 to manage a RHEL 6.5 cluster. All the nodes on the cluster have 2 GPUs installed, but the num_gpus grain contains the value 1.

Here is the output of lspci (notice the two lines that say "3D controller: NVIDIA"):

00:00.0 Host bridge: Intel Corporation Xeon E5 v2/Core i7 DMI2 (rev 04)
00:01.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 1a (rev 04)
00:02.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 04)
00:02.2 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 2c (rev 04)
00:03.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 3a (rev 04)
00:05.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 VTd/Memory Map/Misc (rev 04)
00:05.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 IIO RAS (rev 04)
00:11.0 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Virtual Root Port (rev 05)
00:16.0 Communication controller: Intel Corporation C600/X79 series chipset MEI Controller #1 (rev 05)
00:16.1 Communication controller: Intel Corporation C600/X79 series chipset MEI Controller #2 (rev 05)
00:1a.0 USB controller: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 1 (rev b5)
00:1c.4 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 5 (rev b5)
00:1c.7 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 8 (rev b5)
00:1d.0 USB controller: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #1 (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation C600/X79 series chipset LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA AHCI Controller (rev 05)
01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt](rev 05)
04:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20Xm](rev a1)
07:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
07:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
08:00.0 PCI bridge: Renesas Technology Corp. SH7757 PCIe Switch [PS]
09:00.0 PCI bridge: Renesas Technology Corp. SH7757 PCIe Switch [PS]
09:01.0 PCI bridge: Renesas Technology Corp. SH7757 PCIe Switch [PS]
0a:00.0 PCI bridge: Renesas Technology Corp. SH7757 PCIe-PCI Bridge [PPB]
0b:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. G200eR2
3f:08.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 0 (rev 04)
3f:09.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 1 (rev 04)
3f:0a.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 0 (rev 04)
3f:0a.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 1 (rev 04)
3f:0a.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 2 (rev 04)
3f:0a.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 3 (rev 04)
3f:0b.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
3f:0b.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
3f:0c.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0c.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0c.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0c.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0c.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0d.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
3f:0e.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
3f:0e.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
3f:0f.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Target Address/Thermal Registers (rev 04)
3f:0f.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 RAS Registers (rev 04)
3f:0f.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
3f:0f.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
3f:0f.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
3f:0f.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
3f:10.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 0 (rev 04)
3f:10.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 1 (rev 04)
3f:10.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 0 (rev 04)
3f:10.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 1 (rev 04)
3f:10.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 2 (rev 04)
3f:10.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 3 (rev 04)
3f:10.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 3 (rev 04)
3f:13.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
3f:13.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
3f:13.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Registers (rev 04)
3f:13.5 Performance counters: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Performance Ring Monitoring (rev 04)
3f:16.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 System Address Decoder (rev 04)
3f:16.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
3f:16.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
40:01.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 1a (rev 04)
40:02.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 04)
40:03.0 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 3a (rev 04)
40:03.2 PCI bridge: Intel Corporation Xeon E5 v2/Core i7 PCI Express Root Port 3c (rev 04)
40:05.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 VTd/Memory Map/Misc (rev 04)
40:05.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 IIO RAS (rev 04)
42:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20Xm](rev a1)
7f:08.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 0 (rev 04)
7f:09.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Link 1 (rev 04)
7f:0a.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 0 (rev 04)
7f:0a.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 1 (rev 04)
7f:0a.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 2 (rev 04)
7f:0a.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Power Control Unit 3 (rev 04)
7f:0b.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
7f:0b.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 UBOX Registers (rev 04)
7f:0c.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0c.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0c.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0c.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0c.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0d.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Unicast Registers (rev 04)
7f:0e.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
7f:0e.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 Home Agent 0 (rev 04)
7f:0f.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Target Address/Thermal Registers (rev 04)
7f:0f.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 RAS Registers (rev 04)
7f:0f.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
7f:0f.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
7f:0f.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
7f:0f.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder Registers (rev 04)
7f:10.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 0 (rev 04)
7f:10.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 1 (rev 04)
7f:10.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 0 (rev 04)
7f:10.3 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 1 (rev 04)
7f:10.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 2 (rev 04)
7f:10.5 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 Thermal Control 3 (rev 04)
7f:10.7 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Integrated Memory Controller 1 Channel 0-3 ERROR Registers 3 (rev 04)
7f:13.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
7f:13.1 Performance counters: Intel Corporation Xeon E5 v2/Core i7 R2PCIe (rev 04)
7f:13.4 System peripheral: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Registers (rev 04)
7f:13.5 Performance counters: Intel Corporation Xeon E5 v2/Core i7 QPI Ring Performance Ring Monitoring (rev 04)
7f:16.0 System peripheral: Intel Corporation Xeon E5 v2/Core i7 System Address Decoder (rev 04)
7f:16.1 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)
7f:16.2 System peripheral: Intel Corporation Xeon E5 v2/Core i7 Broadcast Registers (rev 04)

@basepi
Copy link
Contributor

basepi commented Sep 5, 2014

Thanks for the report, we'll investigate this issue.

@terminalmage
Copy link
Contributor

You actually have 3 GPUs in that lspci output. One is VGA, and two are 3D. The function that looks for GPU devices was not looking for 3D GPUs, hence num_gpus showing up as 1. If you look at the gpus grain, you should find just the one VGA adapter there.

I've fixed this in #15639.

@terminalmage terminalmage added the fixed-pls-verify fix is linked, bug author to confirm fix label Sep 9, 2014
@terminalmage terminalmage self-assigned this Sep 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix
Projects
None yet
Development

No branches or pull requests

3 participants