Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"0 devices found" after trying to upgrade xrt #1618

Open
ngdxzy opened this issue Jul 12, 2024 · 4 comments
Open

"0 devices found" after trying to upgrade xrt #1618

ngdxzy opened this issue Jul 12, 2024 · 4 comments

Comments

@ngdxzy
Copy link

ngdxzy commented Jul 12, 2024

I was trying to use the new xdna-driver. Therefore, I followed the installation of xdna driver for linux. However, in the end, the NPU device is not found when running xbutil examine. The output is shown below:

System Configuration
  OS Name              : Linux
  Release              : 6.8.8+
  Version              : #2 SMP PREEMPT_DYNAMIC Fri Jul 12 16:44:38 EDT 2024
  Machine              : x86_64
  CPU Cores            : 16
  Memory               : 28884 MB
  Distribution         : Ubuntu 22.04.4 LTS
  GLIBC                : 2.35
  Model                : NucBox K8
  BIOS vendor          : American Megatrends International, LLC.
  BIOS version         : NucBox K8 1.07

XRT
  Version              : 2.18.0
  Branch               : HEAD
  Hash                 : 73fe5440974fc51ccaba6366094e4bfa8151f79a
  Hash Date            : 2024-07-12 18:42:09
  XOCL                 : unknown, unknown
  XCLMGMT              : unknown, unknown
WARNING: xclmgmt version is unknown. Is xclmgmt driver loaded? Or is MSD/MPD running?
  AMDXDNA              : 2.18.0_20240712, b6db49f792a48123a016ba052d0c2103862547ee

Devices present
  0 devices found

I checked the dmesg. I found the amdxdna driver failed to load:

[    1.982433] kernel: amdxdna: loading out-of-tree module taints kernel.
[    1.982439] kernel: amdxdna: module verification failed: signature and/or required key missing - tainting kernel
[    1.986184] kernel: amdxdna 0000:67:00.1: loading /lib/firmware/amdnpu/1502_00/npu.sbin failed with error -22
[    1.986188] kernel: amdxdna 0000:67:00.1: Direct firmware load for amdnpu/1502_00/npu.sbin failed with error -22
[    1.986190] kernel: amdxdna 0000:67:00.1: aie2_init: failed to request_firmware amdnpu/1502_00/npu.sbin, ret -22
[    1.986223] kernel: amdxdna 0000:67:00.1: amdxdna_probe: Hardware init failed, ret -22
[    1.986245] kernel: amdxdna: probe of 0000:67:00.1 failed with error -22

I also checked the PCI information:

67:00.1 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Device 1502
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1502
	Flags: fast devsel, IRQ 255, IOMMU group 27
	Memory at dc900000 (32-bit, non-prefetchable) [disabled] [size=512K]
	Memory at dc9c0000 (32-bit, non-prefetchable) [disabled] [size=8K]
	Memory at 7c10000000 (64-bit, prefetchable) [disabled] [size=256K]
	Memory at dc980000 (32-bit, non-prefetchable) [disabled] [size=256K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
	Capabilities: [64] Express Endpoint, MSI 00
	Capabilities: [a0] MSI: Enable- Count=1/16 Maskable- 64bit+
	Capabilities: [c0] MSI-X: Enable- Count=16 Masked-
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150] Advanced Error Reporting
	Capabilities: [2a0] Access Control Services
	Capabilities: [2d0] Process Address Space ID (PASID)
	Kernel modules: amdxdna

May I ask how to solve this?

@stephenneuendorffer
Copy link
Collaborator

I'm not sure if this is your problem, but I found I had to explicitly remove and reinstall the xrt plugin when upgrading.

@ngdxzy
Copy link
Author

ngdxzy commented Jul 13, 2024

I'm not sure if this is your problem, but I found I had to explicitly remove and reinstall the xrt plugin when upgrading.

I did remove the old xrt. The only thing I am not sure of is that I compiled a new Linux kernel based on the old one. Does it mean I am required to reinstall the operating system?

@stephenneuendorffer
Copy link
Collaborator

That shouldn't be necessary. Are you sure there were no errors when compiling/installing the xrt_plugin module? The fact that it can't find the firmware is very suspicious.

@ngdxzy
Copy link
Author

ngdxzy commented Jul 13, 2024

I found the problem and I guess it is a BUG that is required to be fixed. In the tutorial (https://github.com/Xilinx/mlir-aie/blob/main/docs/buildHostLin.md), it asks us to switch to an specific commit with:

git reset --hard b6db49f792a48123a016ba052d0c2103862547ee

In this commit, when running the:

cd $XDNA_SRC_DIR/build
./build.sh -release
./build.sh -package

I found ./build.sh -package tries to download npu.sbin from a URL defined in <tools/info.json>, and in this commit, the URL is: https://gitlab.freedesktop.org/drm/firmware/-/raw/amd-ipu-staging/amdnpu/1502_00/npu.sbin.1.4.1.309

However, I found that this URL is not working anymore. Therefore, there will be 404 error. However, an empty file will be generated so everything will run smoothly and no other errors will show up. To solve this, I have to manually change the info.json to the new URL: https://gitlab.freedesktop.org/drm/firmware/-/raw/amd-ipu-staging/amdnpu/1502_00/npu.sbin.1.4.2.323

Now, it works. I believe either the mlir-aie repository or the xdna-driver repository has to change and has this problem fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants