`../tools/scripts/run_cuda_tests` Fails on WSL2 #1533

CLRafaelR · 2024-08-13T13:52:02Z

I am running Ubuntu 24.04 LTS on Windows Subsystem for Linux 2 (WSL2) and trying to perform programming tasks using an NVIDIA GeForce RTX 3060 GPU via OpenCL. Initially, running clinfo -l did not display any platforms or devices due to the issue described in No OpenCL platforms reported · Issue #6951 · microsoft/WSL. However, following the solution provided in this comment, I installed PoCL, and now clinfo and clinfo -l recognise my GPU.

Questions

Despite successfully building and installing PoCL, and having clinfo and clinfo -l functioning correctly, I encounter four failed tests when running the PoCL verification test ../tools/scripts/run_cuda_tests for NIVIDIA GPU shown below and NVIDIA GPU support — Portable Computing Language (PoCL) 6.0 documentation:

cd ~/pocl-6.0/build # move to my `build` directory
../tools/scripts/run_cuda_tests

# For rerunning the failed tests:
../tools/scripts/run_cuda_tests --rerun-failed --output-on-failure

Failed tests were:

The following tests FAILED:
          4 - kernel/test_as_type_loopvec (Failed)
        166 - regression/clSetKernelArg_overwriting_the_previous_kernel's_args_loopvec (Failed)
        208 - runtime/test_device_address (SEGFAULT)
        209 - runtime/test_svm (SEGFAULT)
Errors while running CTest

Regarding these four failed tests, I have two questions:

What do these tests assess?
What can I do to ensure these tests pass? Any guidance on checking path settings or installing additional packages would be greatly appreciated.

Error log

Start testing: Aug 13 21:29 JST
----------------------------------------------------------
4/264 Testing: kernel/test_as_type_loopvec
4/264 Test: kernel/test_as_type_loopvec
Command: "/usr/bin/cmake" "-Dtest_cmd=/home/a6m1/pocl-6.0/build/tests/kernel/kernel####test_as_type" "-P" "/home/a6m1/pocl-6.0/cmake/run_test.cmake"
Directory: /home/a6m1/pocl-6.0/build/tests/kernel
"kernel/test_as_type_loopvec" start time: Aug 13 21:29 JST
Output:
----------------------------------------------------------
Running test test_as_type...
FAIL: as_char3((char4)) - byte #: <format error> expected: <format error>.2x actual: <format error>.2x
OK


<end of output>
Test time =   5.30 sec
----------------------------------------------------------
Test Fail Reason:
Error regular expression found in output. Regex=[FAIL]
"kernel/test_as_type_loopvec" end time: Aug 13 21:29 JST
"kernel/test_as_type_loopvec" time elapsed: 00:00:05
----------------------------------------------------------

208/264 Testing: runtime/test_device_address
208/264 Test: runtime/test_device_address
Command: "/home/a6m1/pocl-6.0/build/tests/runtime/test_device_address"
Directory: /home/a6m1/pocl-6.0/build/tests/runtime
"runtime/test_device_address" start time: Aug 13 21:29 JST
Output:
----------------------------------------------------------
NVIDIA GeForce RTX 3060 OpenCL 3.0 PoCL HSTR: CUDA-sm_75: suitable
<end of output>
Test time =   0.98 sec
----------------------------------------------------------
Test Failed.
"runtime/test_device_address" end time: Aug 13 21:29 JST
"runtime/test_device_address" time elapsed: 00:00:00
----------------------------------------------------------

209/264 Testing: runtime/test_svm
209/264 Test: runtime/test_svm
Command: "/home/a6m1/pocl-6.0/build/tests/runtime/test_svm"
Directory: /home/a6m1/pocl-6.0/build/tests/runtime
"runtime/test_svm" start time: Aug 13 21:29 JST
Output:
----------------------------------------------------------
<end of output>
Test time =   0.97 sec
----------------------------------------------------------
Test Failed.
"runtime/test_svm" end time: Aug 13 21:29 JST
"runtime/test_svm" time elapsed: 00:00:00
----------------------------------------------------------

166/264 Testing: regression/clSetKernelArg_overwriting_the_previous_kernel's_args_loopvec
166/264 Test: regression/clSetKernelArg_overwriting_the_previous_kernel's_args_loopvec
Command: "/usr/bin/cmake" "-Dtest_cmd=/home/a6m1/pocl-6.0/build/tests/regression/test_setargs" "-P" "/home/a6m1/pocl-6.0/cmake/run_test.cmake"
Directory: /home/a6m1/pocl-6.0/build/tests/regression
"regression/clSetKernelArg_overwriting_the_previous_kernel's_args_loopvec" start time: Aug 13 21:29 JST
Output:
----------------------------------------------------------
CMake Error at /home/a6m1/pocl-6.0/cmake/run_test.cmake:34 (message):
  FAIL: Test exited with nonzero code (1):
  /home/a6m1/pocl-6.0/build/tests/regression/test_setargs

  STDOUT:

  FAIL



  STDERR:

  3876879362



<end of output>
Test time =   1.05 sec
----------------------------------------------------------
Test Fail Reason:
Error regular expression found in output. Regex=[FAIL]
"regression/clSetKernelArg_overwriting_the_previous_kernel's_args_loopvec" end time: Aug 13 21:30 JST
"regression/clSetKernelArg_overwriting_the_previous_kernel's_args_loopvec" time elapsed: 00:00:01
----------------------------------------------------------

End testing: Aug 13 21:30 JST

cuda =   8.31 sec*proc

hsa-native =   5.30 sec*proc

internal =   8.31 sec*proc

kernel =   5.30 sec*proc

level0 =   8.31 sec*proc

regression =   1.05 sec*proc

runtime =   1.95 sec*proc

vulkan =   1.05 sec*proc

Steps that I setup NVIDIA Driver and CUDA Toolkit

Installed the NVIDIA Driver from NVIDIA's website.
Verified that libcuda.so exists only in /usr/lib/wsl/lib/libcuda.so using find /usr/ -name libcuda.so.

Followed the CUDA on WSL guide:

Removed the existing key: sudo apt-key del 7fa2af80.

Executed the following commands as per the CUDA 12.1.1 installation guide:

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda-repo-wsl-ubuntu-12-1-local_12.1.1-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-1-local_12.1.1-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

Set the PATH:

echo 'export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}'  >> ~/.bashrc

source ~/.bashrc

Set the PATH:
1. In Ubuntu 24.04, execute sudo nano /etc/wsl.conf and add the following to wsl.conf:
```
# Prevent inheriting Windows PATH
[interop]
appendWindowsPath = false
```
2. In Windows PowerShell, execute:
```
wsl.exe --shutdown
```
Verified that libcuda.so exists in both /usr/lib/wsl/lib/libcuda.so and /usr/local/cuda-12.1/targets/x86_64-linux/lib/stubs/libcuda.so using find /usr/ -name libcuda.so.
- May this conflict with the CUDA on WSL documentation...?

Installed essential build tools:

sudo apt -y install build-essential gcc g++ make libtool texinfo dpkg-dev pkg-config gfortran

Installed OpenBLAS following the OpenBLAS Wiki:

sudo apt update
apt search openblas
sudo apt install libopenblas-dev

PoCL Installation

I ultimately installed PoCL using the following method. I repeated the installation several times with different settings, ensuring to run xargs rm < install_manifest.txt in the pocl-6.0/build directory and deleting the pocl-6.0/build directory before each reinstallation.

Executed the following commands to install PoCL as per the official PoCL installation guide:

export LLVM_VERSION=18
apt install -y python3-dev libpython3-dev build-essential ocl-icd-libopencl1 \
    cmake git pkg-config libclang-${LLVM_VERSION}-dev clang-${LLVM_VERSION} \
    llvm-${LLVM_VERSION} make ninja-build ocl-icd-libopencl1 ocl-icd-dev \
    ocl-icd-opencl-dev libhwloc-dev zlib1g zlib1g-dev clinfo dialog apt-utils \
    libxml2-dev libclang-cpp${LLVM_VERSION}-dev libclang-cpp${LLVM_VERSION} \
    llvm-${LLVM_VERSION}-dev

Downloaded PoCL:

wget https://github.com/pocl/pocl/archive/refs/tags/v6.0.tar.gz

Extracted the tarball:
```
tar -xzvf v6.0.tar.gz
```
Changed to the PoCL directory:
```
cd pocl-6.0
```
Created a build directory:
```
mkdir build
```

Built PoCL following the instructions from the GitHub issue comment:

cmake -B build \
    -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \
    -DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \
    -DENABLE_HOST_CPU_DEVICES=OFF \ # Having both CPU and GPU simultaneously can cause issues https://github.com/pocl/pocl/issues/853#issuecomment-696367623
    -DENABLE_CUDA=ON \
    -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-${LLVM_VERSION} \ # https://forums.developer.nvidia.com/t/need-support-to-run-opencl-application-on-tx2-board/264420/4
    -DENABLE_EXAMPLES=ON # To install CUDA test: NVIDIA GPU support — Portable Computing Language (PoCL) 6.0 documentation https://portablecl.org/docs/html/cuda.html#run-tests

Compiled PoCL:
```
cmake --build build -j34
```

Added environment variables to .bashrc:

echo 'export POCL_BUILDING=1' >> ~/.bashrc
echo 'export OCL_ICD_VENDORS=<FULL_PATH_OF_MY_HOME_DIR>/pocl-6.0/build/ocl-vendors/' >> ~/.bashrc

sudo nano /etc/OpenCL/vendors/nvidia.icd
# Default is `/libnvidia-opencl.so` but there is no such a file in that path.
# Therefore, replace the default line with the following line:
# export /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.550.90.07

source ~/.bashrc

Installed PoCL:
```
cmake --install build
```

Verified GPU recognition with clinfo --list:

$ clinfo --list

Platform #0: Portable Computing Language
    `-- Device #0: NVIDIA GeForce RTX 3060

Ran PoCL tests:

cd ~/pocl-6.0/build
../tools/scripts/run_cuda_tests

# For rerunning the failed tests:
../tools/scripts/run_cuda_tests --rerun-failed --output-on-failure

The text was updated successfully, but these errors were encountered:

pjaaskel · 2024-08-13T15:27:57Z

Thanks for your extensive report. CUDA is still an experimental driver at rather early state without very active progress, as you can see. @isuruf do you have insights? Is this WSL2-specific, that is, do those tests pass on "bare-metal" Linux?

CLRafaelR · 2024-08-13T15:54:45Z

@pjaaskel

Thank you very much for your assistance.

I have also left a comment on the aforementioned issue, asking whether others who attempted PoCL installation on WSL2 using the same method experienced success with ../tools/scripts/run_cuda_tests. I hope this will help determine if the issue is unique to me or specific to WSL2...

pjaaskel · 2024-08-13T16:37:21Z

Note that we do have a CUDA CI bot: https://github.com/pocl/pocl/actions/runs/10354582711/job/28664704111 and it has passed the simple smoke test suite, but the card we use in that box is quite old.

pjaaskel added the CUDA label Aug 13, 2024

CLRafaelR mentioned this issue Aug 13, 2024

No OpenCL platforms reported microsoft/WSL#6951

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`../tools/scripts/run_cuda_tests` Fails on WSL2 #1533

`../tools/scripts/run_cuda_tests` Fails on WSL2 #1533

CLRafaelR commented Aug 13, 2024 •

edited

Loading

pjaaskel commented Aug 13, 2024

CLRafaelR commented Aug 13, 2024

pjaaskel commented Aug 13, 2024

../tools/scripts/run_cuda_tests Fails on WSL2 #1533

../tools/scripts/run_cuda_tests Fails on WSL2 #1533

Comments

CLRafaelR commented Aug 13, 2024 • edited Loading

Questions

Error log

Steps that I setup NVIDIA Driver and CUDA Toolkit

PoCL Installation

pjaaskel commented Aug 13, 2024

CLRafaelR commented Aug 13, 2024

pjaaskel commented Aug 13, 2024

`../tools/scripts/run_cuda_tests` Fails on WSL2 #1533

`../tools/scripts/run_cuda_tests` Fails on WSL2 #1533

CLRafaelR commented Aug 13, 2024 •

edited

Loading