Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No OpenCL platforms reported #6951

Open
1 of 2 tasks
perrymacmurray opened this issue May 17, 2021 · 57 comments
Open
1 of 2 tasks

No OpenCL platforms reported #6951

perrymacmurray opened this issue May 17, 2021 · 57 comments
Labels

Comments

@perrymacmurray
Copy link

perrymacmurray commented May 17, 2021

Windows Build Number

21382.1

WSL Version

  • WSL 2
  • WSL 1

Kernel Version

5.10.16.3

Distro Version

Ubuntu 20.04

Other Software

Inside WSL:
clinfo (for checking OpenCL platforms)
CUDA 11.3 (docker container runs with NVIDIA_DISABLE_REQUIRE=1, as it otherwise thinks it's running 11.0)
Docker 20.10.6, build 370c289 (with custom container)
nvidia-docker2 2.5.0-1

On Windows:
NVIDIA Graphics Driver for CUDA on WSL 470.14

Repro Steps

I installed the Nvidia drivers and docker as according to Nvidia's user guide
I am however running an older version of nvidia-docker2 (and dependencies) as according to a forum post here

Additionally, I have also installed the CUDA on WSL driver here

Steps:
Run clinfo (both in and outside of the Docker container)

Expected Behavior

clinfo should return the graphics card (in my case, GTX 970) as an OpenCL platform

Actual Behavior

clinfo reports 0 platforms available, both inside the container and just on WSL

Diagnostic Logs

cuda
nvidia-container-cli
glxinfo (from inside of container)
glxinfo (from WSL, outside of container)
wsl.etl

@perrymacmurray perrymacmurray changed the title No OpenCL platforms reported in Docker container No OpenCL platforms reported May 17, 2021
@adrastogi adrastogi added the GPU label May 17, 2021
@chrisfranko
Copy link

One day <3

@TGM
Copy link

TGM commented Dec 6, 2021

Any update here?

@wuweijia1994
Copy link

Check for any update

@richgel999
Copy link

Yes, OpenCL is a crucial feature. We're putting together a native Linux box for testing next week due to this.

@bridgerrholt
Copy link

This would be wonderful for my team. We have considered rewriting everything in cuda, but that has major downsides. Until OpenCL support is released, we are stuck dual-booting.

@lmeyerov
Copy link

YEP

@jincheng-ai
Copy link

hope for any update

1 similar comment
@jincheng-ai
Copy link

hope for any update

@lmeyerov
Copy link

In theory OpenCL/WSL2 may now work for Intel Integrated Graphics GPUs: https://devblogs.microsoft.com/commandline/oneapi-l0-openvino-and-opencl-coming-to-the-windows-subsystem-for-linux-for-intel-gpus/

Trying a few days ago, I didn't see any CPU platforms get registered (I am on AMD for CPU) nor any GPU (I am on Nvidia for GPU)

@clausagerskov
Copy link

any info about whether NVIDIA GPU computing is planned to be supported for OpenCL?

@Foosec
Copy link

Foosec commented May 21, 2022

Any new info a year later?

@richgel999
Copy link

Any new info a year later?

Better late than never, right?

@HO-COOH
Copy link

HO-COOH commented Jun 15, 2022

image
Still waiting

@liubola
Copy link

liubola commented Sep 23, 2022

Still waiting

@Eboubaker
Copy link

same issue when trying to run a boost example program

terminate called after throwing an instance of 'boost::wrapexcept<boost::compute::no_device_found>'
  what():  No OpenCL device found

@husmen
Copy link

husmen commented Feb 5, 2023

I should have checked this before wasting a whole day trying to get it to work ... still waiting

@73ad
Copy link

73ad commented Feb 13, 2023

I should have checked this before wasting a whole day trying to get it to work ... still waiting

same

@jorgevazquezperez
Copy link

I found the solution, as it is going to be usual from now on, by asking ChatGPT. To set up OpenCL on WSL, you can follow these general steps:

  1. Install a Linux distribution in WSL, such as Ubuntu, and make sure it is up to date.
  2. Install OpenCL driver for your GPU. You can download the appropriate driver from the GPU vendor's website and follow the installation instructions.
  3. Install the OpenCL development package, which contains the necessary libraries, headers, and tools to develop OpenCL applications. You can do this by running the following command in a terminal:

sudo apt-get install ocl-icd-opencl-dev

  1. Install an OpenCL implementation, such as the open-source OpenCL runtime from the Khronos Group called "POCL." You can install POCL by running the following command:

sudo apt-get install pocl-opencl-icd

  1. Set the LD_LIBRARY_PATH environment variable to point to the OpenCL libraries. You can do this by adding the following line to your ~/.bashrc file:

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH

After completing these steps, you should be able to use OpenCL on WSL. Note that the specific steps and packages required may vary depending on the Linux distribution, GPU hardware, and OpenCL implementation you are using.

@lmeyerov
Copy link

@jorgevazquezperez is that proven working or a hallucination?

@jorgevazquezperez
Copy link

Proven working. It is needed to note that I have only achieved it with the CPU, but I am in process to be able to do it with the GPU. I attach you a picture with the results and I will keep you updated with the GPU version (as I imagine that it is the one you all are looking forward to). If you need more info just tell me!

PD: I am using python with the pyopencl library.

image

@lmeyerov
Copy link

Yes, afaict CPU and integrated Intel GPU should work, but unclear if/how Nvidia

@gyferlim
Copy link

gyferlim commented Jul 30, 2023

It's about getting Nvidia OpenCL to work in WSL2, not CUDA Runtime or AMD HIP/ROCM, is that correct? Please accept my apology if I misunderstood earlier.

I remember having CUDA and AMD running as OpenCL in WSL2 in the past. However, due to driver changes, I had to reinstall Nvidia, which resulted in it breaking. Below is my clinfo with the AMD platform running. Unfortunately, both my RX470 and RX580 died, and I don't have another AMD card to try and show the result.

Number of platforms                               2
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (3581.0)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 4.1-pre main-0-ga3e43d58  Linux, Debug+Asserts, RELOC, SPIR, LLVM 12.0.0, SLEEF, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Host timer resolution                  0ns
  Platform Extensions function suffix             POCL

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 0

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     cpu-Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  Device Vendor                                   GenuineIntel
  Device Vendor ID                                0x6c636f70
  Device Version                                  OpenCL 1.2 PoCL HSTR: cpu-x86_64-pc-linux-gnu-haswell
  Driver Version                                  4.1-pre main-0-ga3e43d58
  Device OpenCL C Version                         OpenCL C 1.2 PoCL
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               8
  Max clock frequency                             3392MHz
  Device Partition                                (core)
    Max number of sub-devices                     8
    Supported partition types                     equally, by counts
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             4096x4096x4096
  Max work group size                             4096
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                               16 / 16      
    int                                                  8 / 8       
    long                                                 4 / 4       
    half                                                 0 / 0        (n/a)
    float                                                8 / 8       
    double                                               4 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              5802754048 (5.404GiB)
  Error Correction support                        No
  Max memory allocation                           2147483648 (2GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8388608 (8MiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Global
  Local memory size                               262144 (256KiB)
  Max number of constant args                     8
  Max constant buffer size                        262144 (256KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 (n/a)
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_command_buffer cl_khr_spir cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics


NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No devices found in platform
	NOTE:	your OpenCL library only supports OpenCL 2.2,
		but some installed platforms support OpenCL 3.0.
		Programs using 3.0 features may crash
		or behave unexpectedly

@lmeyerov
Copy link

Yes, that is still not showing Nvidia <> opencl <> wsl2

@emmanuelattia
Copy link

Since NVIDIA OpenCL ICD is built on top of CUDA, it's a bit hard to understand why OpenCL/NVidia is not supported under WSL2 when CUDA is functional. Clearly it's not a technical issue but a commercial issue. Please correct me if i'm wrong.

@gyferlim
Copy link

I manage to get Intel OpenCL working in WSL2, I think.

  1. Follow the instructions given here : https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html#UBUNTU-22-04-JAMMY

  2. create a file name "intel.icd" in /etc/OpenCL/vendors , with

/usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
Number of platforms                               2
  Platform Name                                   Intel(R) OpenCL Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 3.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_spirv_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Platform Extensions with Version                cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_intel_split_work_group_barrier                                0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_ext_float_atomics                                             0x400000 (1.0.0)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_motion_estimation                                       0x400000 (1.0.0)
                                                  cl_intel_device_side_avc_motion_estimation                       0x400000 (1.0.0)
                                                  cl_intel_spirv_device_side_avc_motion_estimation                 0x400000 (1.0.0)
                                                  cl_intel_advanced_motion_estimation                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_khr_gl_depth_images                                           0x400000 (1.0.0)
                                                  cl_khr_gl_event                                                  0x400000 (1.0.0)
                                                  cl_khr_gl_msaa_sharing                                           0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             INTEL
  Platform Host timer resolution                  1ns

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 4.1-pre main-0-ga3e43d58  Linux, Debug+Asserts, RELOC, SPIR, LLVM 14.0.0, SLEEF, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  0ns

  Platform Name                                   Intel(R) OpenCL Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Graphics [0x5917]
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 3.0 NEO
  Device UUID                                     86801759-0700-0000-0002-000000000000
  Driver UUID                                     32332e32-322e-3236-3531-362e31380000
  Valid Device LUID                               No
  Device LUID                                     5017-c9c1fd7f0000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  23.22.26516.18
  Device OpenCL C Version                         OpenCL C 1.2
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_read_write_images                                     0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_all_devices                              0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_work_group_collective_functions                       0xc00000 (3.0.0)
                                                  __opencl_c_subgroups                                             0xc00000 (3.0.0)
                                                  __opencl_c_pipes                                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest comfornace test passed                   v2022-04-22-00
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               24
  Max clock frequency                             1150MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Max sub-groups per work group                   32
  Sub-group sizes (Intel)                         8, 16, 32
  Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 1 / 1
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              5101215744 (4.751GiB)
  Error Correction support                        No
  Max memory allocation                           1073741824 (1024MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
  Atomic fence capabilities                       relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             1073741824 (1024MiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        786432 (768KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            67108864 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 pixels
    Max 2D image size                             16384x16384 pixels
    Max planar YUV image size                     16384x16352 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Pipe support                                    Yes
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     8
  Max constant buffer size                        1073741824 (1024MiB)
  Generic address space support                   Yes
  Max size of kernel argument                     2048 (2KiB)
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      83ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       Yes
    Work-group collective functions               Yes
    Sub-group independent forward progress        Yes
    IL version                                    SPIR-V_1.2
    ILs with version                              SPIR-V                                                           0x402000 (1.2.0)
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
  Built-in kernels with version                   block_motion_estimate_intel                                      0x400000 (1.0.0)
                                                  block_advanced_motion_estimate_check_intel                       0x400000 (1.0.0)
                                                  block_advanced_motion_estimate_bidirectional_check_intel         0x400000 (1.0.0)
  Motion Estimation accelerator version (Intel)   2
    Device-side AVC Motion Estimation version     1
      Supports texture sampler use                Yes
      Supports preemption                         No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_spirv_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_intel_split_work_group_barrier                                0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_ext_float_atomics                                             0x400000 (1.0.0)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_motion_estimation                                       0x400000 (1.0.0)
                                                  cl_intel_device_side_avc_motion_estimation                       0x400000 (1.0.0)
                                                  cl_intel_spirv_device_side_avc_motion_estimation                 0x400000 (1.0.0)
                                                  cl_intel_advanced_motion_estimation                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_khr_gl_depth_images                                           0x400000 (1.0.0)
                                                  cl_khr_gl_event                                                  0x400000 (1.0.0)
                                                  cl_khr_gl_msaa_sharing                                           0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     cpu-Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  Device Vendor                                   GenuineIntel
  Device Vendor ID                                0x6c636f70
  Device Version                                  OpenCL 3.0 PoCL HSTR: cpu-x86_64-pc-linux-gnu-skylake
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  4.1-pre main-0-ga3e43d58
  Device OpenCL C Version                         OpenCL C 1.2 PoCL
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_subgroups                                             0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_all_devices                              0xc00000 (3.0.0)
                                                  __opencl_c_read_write_images                                     0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
  Latest comfornace test passed                   v2022-04-19-01
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               8
  Max clock frequency                             2111MHz
  Device Partition                                (core)
    Max number of sub-devices                     8
    Supported partition types                     equally, by counts
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             4096x4096x4096
  Max work group size                             4096
  Preferred work group size multiple (device)     8
  Preferred work group size multiple (kernel)     8
  Max sub-groups per work group                   128
  Sub-group sizes (Intel)                         1, 2, 4, 8, 16, 32, 64, 128, 256, 512
  Preferred / native vector sizes
    char                                                16 / 16
    short                                               16 / 16
    int                                                  8 / 8
    long                                                 4 / 4
    half                                                 0 / 0        (n/a)
    float                                                8 / 8
    double                                               4 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4613160960 (4.296GiB)
  Error Correction support                        No
  Max memory allocation                           2147483648 (2GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
  Atomic fence capabilities                       relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope
  Max size for global variable                    64000 (62.5KiB)
  Preferred total size of global vars             262144 (256KiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8388608 (8MiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Global
  Local memory size                               262144 (256KiB)
  Max number of constant args                     8
  Max constant buffer size                        262144 (256KiB)
  Generic address space support                   Yes
  Max size of kernel argument                     1024
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        Yes
    IL version                                    (n/a)
    ILs with version                              (n/a)
    SPIR versions                                 (n/a)
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
  Built-in kernels with version                   pocl.add.i8                                                      0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.nn.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.bl.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.tensor_convert_depth.wrap.u8.f32              0x402000 (1.2.0)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_command_buffer cl_khr_subgroups cl_intel_unified_shared_memory cl_khr_subgroup_ballot cl_khr_subgroup_shuffle cl_intel_subgroups cl_intel_required_subgroup_size cl_khr_spir cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_command_buffer                                              0x9000 (0.9.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x801000 (2.1.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)


NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) OpenCL Graphics
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [INTEL]
  clCreateContext(NULL, ...) [default]            Success [INTEL]
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Graphics [0x5917]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Graphics [0x5917]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Graphics [0x5917]

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.1
  ICD loader Profile                              OpenCL 3.0

@lmeyerov
Copy link

This issue is about Nvidia cards not being shown, not intel/amd

@Entropy512
Copy link

This issue is about Nvidia cards not being shown, not intel/amd

The title of the issue is simply "No OpenCL platforms reported" - not "No NVidia OpenCL platforms reported"

@perrymacmurray
Copy link
Author

This issue is about Nvidia cards not being shown, not intel/amd

The title of the issue is simply "No OpenCL platforms reported" - not "No NVidia OpenCL platforms reported"

The issue is about Nvidia cards not being shown.

@edmondium
Copy link

Date: Nov 4, 2021 https://devblogs.microsoft.com/commandline/oneapi-l0-openvino-and-opencl-coming-to-the-windows-subsystem-for-linux-for-intel-gpus/

extra helpful info:

  1. https://www.intel.com/content/www/us/en/artificial-intelligence/harness-the-power-of-intel-igpu-on-your-machine.html
  2. https://github.com/intel/compute-runtime/releases/tag/21.35.20826
user@WSL2:~$ sudo clinfo
Number of platforms                               3
  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 3.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Platform Extensions with Version                cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_intel_spirv_device_side_avc_motion_estimation                 0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory_preview                           0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_motion_estimation                                       0x400000 (1.0.0)
                                                  cl_intel_device_side_avc_motion_estimation                       0x400000 (1.0.0)
                                                  cl_intel_advanced_motion_estimation                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             INTEL
  Platform Host timer resolution                  1ns

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Graphics [0x5917]
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 3.0 NEO
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  21.35.20826
  Device OpenCL C Version                         OpenCL C 3.0
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0x800000 (2.0.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_read_write_images                                     0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_all_devices                              0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_work_group_collective_functions                       0xc00000 (3.0.0)
                                                  __opencl_c_subgroups                                             0xc00000 (3.0.0)
                                                  __opencl_c_device_enqueue                                        0xc00000 (3.0.0)
                                                  __opencl_c_pipes                                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-06-16-00
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               24
  Max clock frequency                             1150MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Max sub-groups per work group                   32
  Sub-group sizes (Intel)                         8, 16, 32
  Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 1 / 1
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              5101215744 (4.751GiB)
  Error Correction support                        No
  Max memory allocation                           1073741824 (1024MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
  Atomic fence capabilities                       relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             1073741824 (1024MiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        524288 (512KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            67108864 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 pixels
    Max 2D image size                             16384x16384 pixels
    Max planar YUV image size                     16384x16352 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Pipe support                                    Yes
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     8
  Max constant buffer size                        1073741824 (1024MiB)
  Generic address space support                   Yes
  Max size of kernel argument                     2048 (2KiB)
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     supported, replaceable default queue
  Queue properties (on device)
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                131072 (128KiB)
    Max size                                      67108864 (64MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      83ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       Yes
    Work-group collective functions               Yes
    Sub-group independent forward progress        Yes
    IL version                                    SPIR-V_1.2
    ILs with version                              SPIR-V                                                           0x402000 (1.2.0)
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
  Built-in kernels with version                   block_motion_estimate_intel                                      0x400000 (1.0.0)
                                                  block_advanced_motion_estimate_check_intel                       0x400000 (1.0.0)
                                                  block_advanced_motion_estimate_bidirectional_check_intel         0x400000 (1.0.0)
  Motion Estimation accelerator version (Intel)   2
    Device-side AVC Motion Estimation version     1
      Supports texture sampler use                Yes
      Supports preemption                         No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_intel_spirv_device_side_avc_motion_estimation                 0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory_preview                           0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_motion_estimation                                       0x400000 (1.0.0)
                                                  cl_intel_device_side_avc_motion_estimation                       0x400000 (1.0.0)
                                                  cl_intel_advanced_motion_estimation                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)


  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) OpenCL HD Graphics
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [INTEL]
  clCreateContext(NULL, ...) [default]            Success [INTEL]
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Graphics [0x5917]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Graphics [0x5917]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Graphics [0x5917]

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.1
  ICD loader Profile                              OpenCL 3.0
  • CLINFO is partially truncated to only show Intel HD Platform part
user@WSL2:~$ sudo hashcat -I
hashcat (v6.2.5) starting in backend information mode

clGetDeviceIDs(): CL_DEVICE_NOT_FOUND

clGetDeviceIDs(): CL_DEVICE_NOT_FOUND

OpenCL Info:
============

OpenCL Platform ID #1
  Vendor..: Intel(R) Corporation
  Name....: Intel(R) OpenCL HD Graphics
  Version.: OpenCL 3.0

  Backend Device ID #1
    Type...........: GPU
    Vendor.ID......: 8
    Vendor.........: Intel(R) Corporation
    Name...........: Intel(R) Graphics [0x5917]
    Version........: OpenCL 3.0 NEO
    Processor(s)...: 24
    Clock..........: 1150
    Memory.Total...: 4864 MB (limited to 512 MB allocatable in one block)
    Memory.Free....: 2400 MB
    OpenCL.Version.: OpenCL C 3.0
    Driver.Version.: 21.35.20826

OpenCL Platform ID #2
  Vendor..: The pocl project
  Name....: Portable Computing Language
  Version.: OpenCL 2.0 pocl 1.8  Linux, None+Asserts, RELOC, LLVM 11.1.0, SLEEF, DISTRO, POCL_DEBUG

  Backend Device ID #2
    Type...........: CPU
    Vendor.ID......: 128
    Vendor.........: GenuineIntel
    Name...........: pthread-Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
    Version........: OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-skylake
    Processor(s)...: 8
    Clock..........: 2111
    Memory.Total...: 4399 MB (limited to 1024 MB allocatable in one block)
    Memory.Free....: 2167 MB
    OpenCL.Version.: OpenCL C 1.2 pocl
    Driver.Version.: 1.8

OpenCL Platform ID #3
  Vendor..: Mesa
  Name....: Clover
  Version.: OpenCL 1.1 Mesa 22.2.5

Is not a myth. Hope this clear things up for everyone.

edmondium@LAPTOP-1Q9H40K6:~$ clinfo
Abort was called at 54 line in file:
./shared/source/os_interface/windows/wddm/create_um_km_data_translator.cpp
Aborted

@lmeyerov
Copy link

lmeyerov commented Nov 5, 2023

Looks like an Intel cpu/GPU again ^^^^, so same status

@alex-ong
Copy link

alex-ong commented Jan 7, 2024

Has anyone gotten OpenCL working with AMD CPUs (e.g. 2700x, 5600x, 5800x)? At a bare minimum i could do some dev work if that works. Production is a Linux machine running Linux Docker instances, that forwards Nvidia GPU's perfectly fine. I read a few things saying you could "just install the Intel CPU OpenCL driver", and i installed that but still get 0 platforms in clinfo.

Edit: If you use a recent enough version of Ubuntu (i used 24.04, which is bleeding edge), you can just apt install pocl-opencl-icd I was using miniconda3, so i manually built my own image, basing it on ubuntu:24.04 and copying the miniconda3 docker commands exactly, then adding apt install pocl-opencl-icd at the end. This successfully showed my 5600x as an opencl device in clinfo. This does not work for me in 22.04 due to pocl being too old, and it requiring way too many dependencies to recompile it. So you could probably get it working in 22.04 too with enough effort.

(base) root@cfbb31c89f97:/# clinfo
Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 4.0+debian  Linux, None+Asserts, RELOC, SPIR, LLVM 15.0.7, SLEEF, DISTRO, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  0ns

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     cpu-haswell-AMD Ryzen 5 5600X 6-Core Processor

Note that this is still not a solution for nvidia/amd GPU opencl passthrough, but it's good enough for my development needs.

@Seralpa
Copy link

Seralpa commented Jan 18, 2024

@alex-ong I have it working in WSL2 Ubuntu 22.04 without pocl on a laptop with a 5800HS, but for the intel platforms to be detected I have to source the setvars.sh script from the Oneapi installation. source /opt/intel/oneapi/setvars.sh

I installed it a long time ago so I don't recall the details of how I installed it. But I don't remember having much trouble with it.

@Bossach
Copy link

Bossach commented Jan 29, 2024

I was able to run OpenCL on NVIDIA on WSL2 via PoCL
There is "NVIDIA GeForce RTX 3060 Ti" device in clinfo output (listing below) and working OpenCL apps
Windows task manager also shows GPU Cuda utilization when CL programs run
(can say nothing about perfomance but got some benchmark below)

I took the following steps:

  1. instal the latest Windows Nvidia drivers (idk since which version, but new ones can do some clever thing to expose GPU inside WSL)

1.1. Now you can run nvidia-smi in WSL to ensure it works
$ nvidia-smi listing:

Mon Jan 29 04:05:38 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.06              Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   37C    P8             11W /  225W |     607MiB /   8192MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        20      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        42      G   /code                                       N/A      |
|    0   N/A  N/A        79      G   /code                                       N/A      |
+-----------------------------------------------------------------------------------------+
  1. DO NOT install any gpu/cuda drivers into WSL

  2. Install Cuda-Toolkit "WSL-Ubuntu" version from here (nvidia page)
    (I have Debian on my WSL, but there wasnt any problems to install) (except shocking that i need to download ~9GB of packages...)
    Again, ensure that you don't installing linux drivers ("WSL-Ubuntu" supposed to not contain them)

  3. AFAIK you need to have llvm/clang installed in order to compile kernels via PoCL
    so $ sudo apt install llvm clang i think
    (There also possibility of "LLVM-less buid", but dont mind)

  4. Install some packages required to build PoCL
    (i almost sure that forgot something)
    $ sudo apt install ...
    libclang-dev (maybe also libclang-{version}-dev)
    libclang-common-{version}-dev
    libclang-cpp (maybe also libclang-cpp{version})
    libclang-cpp-dev (libclang-cpp{ver}-dev)
    ocl-icd-libopencl1 (maybe also ocl-icd-opencl-dev) - icd loader
    opencl-headers (opencl-c-headers opencl-clhpp-headers)
    valgrind (because some cuda-related PoCL sources requires it)

  5. Download and build Pocl (GitHub)
    I was build with this variables (from pocl directory):

$ cmake -B {your-build-dir} \
    -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \          # for ld to find libcuda.so
    -DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \      # i don't know which of these two is neccessary, but it works
    -DENABLE_HOST_CPU_DEVICES=OFF \           # you can leave this 'ON' if you want also have your CPU as OpenCL device
    -DENABLE_CUDA=ON \                                  # no comments

Then run $ cmake --build {your-build-dir} -j{num of threads} and pray and maybe fix problems that arise

On successful build you can try if it works without installing
$ export POCL_BUILDING=1 - says to pocl that it will able to work from building directory
$ export OCL_ICD_VENDORS={full-path-to-your-build-dir}/ocl-vendors/ - says to ocl-icd-loader where to find pocl
Viola!
Now you can run 'clinfo' and other OpenCL apps

Also $ cmake --install {your-build-dir} to istall in system if you need (i dont so not testing)

My clinfo listing:

Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 5.1-pre main-0-g8053faf0  Linux, Debug+Asserts, RELOC, SPIR, LLVM 15.0.6, SLEEF, CUDA, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  0ns

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     NVIDIA GeForce RTX 3060 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 PoCL HSTR: CUDA-sm_86
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  5.1-pre main-0-g8053faf0
  Device OpenCL C Version                         OpenCL C 1.2 PoCL
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp16                                                  0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest conformance test passed                  (n/a)
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               38
  Max clock frequency                             1695MHz
  Compute Capability (NV)                         8.6
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              8589279232 (7.999GiB)
  Error Correction support                        No
  Max memory allocation                           2147319808 (2GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        None
  Image support                                   No
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   Yes
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        Yes
    Kernel execution timeout (NV)                 Yes
    Concurrent copy and kernel execution (NV)     Yes
      Number of async copy engines                5
    IL version                                    (n/a)
    ILs with version                              (n/a)
    SPIR versions                                 (n/a)
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
  Built-in kernels with version                   pocl.mul.i32                                                     0x402000 (1.2.0)
                                                  pocl.add.i32                                                     0x402000 (1.2.0)
                                                  pocl.dnn.conv2d_int8_relu                                        0x402000 (1.2.0)
                                                  pocl.sgemm.local.f32                                             0x402000 (1.2.0)
                                                  pocl.sgemm.tensor.f16f16f32                                      0x402000 (1.2.0)
                                                  pocl.sgemm_ab.tensor.f16f16f32                                   0x402000 (1.2.0)
                                                  pocl.abs.f32                                                     0x402000 (1.2.0)
                                                  pocl.add.i8                                                      0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.nn.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.bl.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.tensor_convert_depth.wrap.u8.f32              0x402000 (1.2.0)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics     cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics     cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics     cl_khr_int64_extended_atomics cl_nv_device_attribute_query cl_khr_spir cl_khr_fp16 cl_khr_fp64
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x801000 (2.1.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti

And some benchmark (i dont know what these numbers means, good or bad)

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /               FluidX3D Version 2.12 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 3060 Ti                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce RTX 3060 Ti                                 |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 5.1-pre main-0-g8053faf0 (Linux)                           |
| OpenCL Version | OpenCL C 1.2 PoCL                                          |
| Compute Units  | 38 at 1695 MHz (4864 cores, 16.489 TFLOPs/s)               |
| Memory, Cache  | 8191 MB, 0 KB global / 48 KB local                         |
| Buffer Limits  | 2047 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    3307 |    506 GB/s |       197 |         9990   0% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3466                                                   |

@husmen
Copy link

husmen commented Feb 3, 2024

@Bossach Now this looks promising! PoCL too seems to have come a long way since I last checked. I might give it a try at some point.

@joaomamede
Copy link

I was able to run OpenCL on NVIDIA on WSL2 via PoCL There is "NVIDIA GeForce RTX 3060 Ti" device in clinfo output (listing below) and working OpenCL apps Windows task manager also shows GPU Cuda utilization when CL programs run (can say nothing about perfomance but got some benchmark below)

I took the following steps:

  1. instal the latest Windows Nvidia drivers (idk since which version, but new ones can do some clever thing to expose GPU inside WSL)

1.1. Now you can run nvidia-smi in WSL to ensure it works $ nvidia-smi listing:

Mon Jan 29 04:05:38 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.06              Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   37C    P8             11W /  225W |     607MiB /   8192MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        20      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        42      G   /code                                       N/A      |
|    0   N/A  N/A        79      G   /code                                       N/A      |
+-----------------------------------------------------------------------------------------+
  1. DO NOT install any gpu/cuda drivers into WSL
  2. Install Cuda-Toolkit "WSL-Ubuntu" version from here (nvidia page)
    (I have Debian on my WSL, but there wasnt any problems to install) (except shocking that i need to download ~9GB of packages...)
    Again, ensure that you don't installing linux drivers ("WSL-Ubuntu" supposed to not contain them)
  3. AFAIK you need to have llvm/clang installed in order to compile kernels via PoCL
    so $ sudo apt install llvm clang i think
    (There also possibility of "LLVM-less buid", but dont mind)
  4. Install some packages required to build PoCL
    (i almost sure that forgot something)
    $ sudo apt install ...
    libclang-dev (maybe also libclang-{version}-dev)
    libclang-common-{version}-dev
    libclang-cpp (maybe also libclang-cpp{version})
    libclang-cpp-dev (libclang-cpp{ver}-dev)
    ocl-icd-libopencl1 (maybe also ocl-icd-opencl-dev) - icd loader
    opencl-headers (opencl-c-headers opencl-clhpp-headers)
    valgrind (because some cuda-related PoCL sources requires it)
  5. Download and build Pocl (GitHub)
    I was build with this variables (from pocl directory):
$ cmake -B {your-build-dir} \
    -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \          # for ld to find libcuda.so
    -DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \      # i don't know which of these two is neccessary, but it works
    -DENABLE_HOST_CPU_DEVICES=OFF \           # you can leave this 'ON' if you want also have your CPU as OpenCL device
    -DENABLE_CUDA=ON \                                  # no comments

Then run $ cmake --build {your-build-dir} -j{num of threads} and pray and maybe fix problems that arise

On successful build you can try if it works without installing $ export POCL_BUILDING=1 - says to pocl that it will able to work from building directory $ export OCL_ICD_VENDORS={full-path-to-your-build-dir}/ocl-vendors/ - says to ocl-icd-loader where to find pocl Viola! Now you can run 'clinfo' and other OpenCL apps

Also $ cmake --install {your-build-dir} to istall in system if you need (i dont so not testing)

My clinfo listing:

Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 5.1-pre main-0-g8053faf0  Linux, Debug+Asserts, RELOC, SPIR, LLVM 15.0.6, SLEEF, CUDA, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  0ns

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     NVIDIA GeForce RTX 3060 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 PoCL HSTR: CUDA-sm_86
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  5.1-pre main-0-g8053faf0
  Device OpenCL C Version                         OpenCL C 1.2 PoCL
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp16                                                  0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest conformance test passed                  (n/a)
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               38
  Max clock frequency                             1695MHz
  Compute Capability (NV)                         8.6
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              8589279232 (7.999GiB)
  Error Correction support                        No
  Max memory allocation                           2147319808 (2GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        None
  Image support                                   No
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   Yes
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        Yes
    Kernel execution timeout (NV)                 Yes
    Concurrent copy and kernel execution (NV)     Yes
      Number of async copy engines                5
    IL version                                    (n/a)
    ILs with version                              (n/a)
    SPIR versions                                 (n/a)
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
  Built-in kernels with version                   pocl.mul.i32                                                     0x402000 (1.2.0)
                                                  pocl.add.i32                                                     0x402000 (1.2.0)
                                                  pocl.dnn.conv2d_int8_relu                                        0x402000 (1.2.0)
                                                  pocl.sgemm.local.f32                                             0x402000 (1.2.0)
                                                  pocl.sgemm.tensor.f16f16f32                                      0x402000 (1.2.0)
                                                  pocl.sgemm_ab.tensor.f16f16f32                                   0x402000 (1.2.0)
                                                  pocl.abs.f32                                                     0x402000 (1.2.0)
                                                  pocl.add.i8                                                      0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.nn.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.bl.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.tensor_convert_depth.wrap.u8.f32              0x402000 (1.2.0)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics     cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics     cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics     cl_khr_int64_extended_atomics cl_nv_device_attribute_query cl_khr_spir cl_khr_fp16 cl_khr_fp64
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x801000 (2.1.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 3060 Ti

And some benchmark (i dont know what these numbers means, good or bad)

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /               FluidX3D Version 2.12 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 3060 Ti                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce RTX 3060 Ti                                 |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 5.1-pre main-0-g8053faf0 (Linux)                           |
| OpenCL Version | OpenCL C 1.2 PoCL                                          |
| Compute Units  | 38 at 1695 MHz (4864 cores, 16.489 TFLOPs/s)               |
| Memory, Cache  | 8191 MB, 0 KB global / 48 KB local                         |
| Buffer Limits  | 2047 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    3307 |    506 GB/s |       197 |         9990   0% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3466                                                   |

This worked for me. I installed it but the arguments aren't passed by default if I do clinfo it works with the
$ export POCL_BUILDING=1
$ export OCL_ICD_VENDORS={full-path-to-your-build-dir}/ocl-vendors/ -
But it doesn't "stick" should I put this into my bash.rc or rc.local or something like that or there's a cleaner way?

@Bossach
Copy link

Bossach commented Feb 21, 2024

@joaomamede
The cleaner way is
$ sudo cmake --install {your-build-dir}
It should install pocl and icd in system and it should just work
if not, first i would chek is $ ls /etc/OpenCL/vendors contains pocl.icd and $ cat pocl.icd contains valid path to /.../libpocl.so... and libpocl indeed exists there. If not, then something is wrong with installation

Alternatively, yo can put exports in your bash.rc and it should work for all apps you launch from bash under your user. (until you accidentally remove pocl build directory cause it works from there)

@Tongzhao9417
Copy link

@Bossach

Thanks for your share! I follow your step and it almost successful. However, the clinfo told me that "unknown target CPU 'sm_89'". Here is my full output and full benchmark.

clinfo:

Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 3.0 PoCL 5.0  Linux, RelWithDebInfo, RELOC, SPIR, LLVM 14.0.0, SLEEF, CUDA, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             POCL
  Platform Host timer resolution                  0ns

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     NVIDIA GeForce RTX 4090
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 PoCL HSTR: CUDA-sm_89
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  5.0
  Device OpenCL C Version                         OpenCL C 1.2 PoCL
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp16                                                  0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest comfornace test passed                   (n/a)
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               128
  Max clock frequency                             2595MHz
  Compute Capability (NV)                         8.9
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
=== CL_PROGRAM_BUILD_LOG ===
error: unknown target CPU 'sm_89'
Device NVIDIA GeForce RTX 4090 failed to build the program
  Preferred work group size multiple (kernel)     <getWGsizes:1504: create kernel : error -45>
  Warp size (NV)                                  32
  Max sub-groups per work group                   32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (cl_khr_fp16)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              25756696576 (23.99GiB)
  Error Correction support                        No
  Max memory allocation                           6439174144 (5.997GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        None
  Image support                                   No
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   Yes
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)
    Out-of-order execution                        No
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        Yes
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  1
    IL version                                    (n/a)
    ILs with version                              (n/a)
    SPIR versions                                 (n/a)
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
  Built-in kernels with version                   pocl.mul.i32                                                     0x402000 (1.2.0)
                                                  pocl.add.i32                                                     0x402000 (1.2.0)
                                                  pocl.dnn.conv2d_int8_relu                                        0x402000 (1.2.0)
                                                  pocl.sgemm.local.f32                                             0x402000 (1.2.0)
                                                  pocl.sgemm.tensor.f16f16f32                                      0x402000 (1.2.0)
                                                  pocl.sgemm_ab.tensor.f16f16f32                                   0x402000 (1.2.0)
                                                  pocl.abs.f32                                                     0x402000 (1.2.0)
                                                  pocl.add.i8                                                      0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.nn.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.scale_image.bl.u8                             0x402000 (1.2.0)
                                                  org.khronos.openvx.tensor_convert_depth.wrap.u8.f32              0x402000 (1.2.0)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics     cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics     cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics     cl_khr_int64_extended_atomics cl_nv_device_attribute_query cl_khr_spir cl_khr_fp16 cl_khr_fp64
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x801000 (2.1.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 4090
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 4090
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   NVIDIA GeForce RTX 4090

benchmark:

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /               FluidX3D Version 2.13 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 4090                                    |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce RTX 4090                                    |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 5.0 (Linux)                                                |
| OpenCL Version | OpenCL C 1.2 PoCL                                          |
| Compute Units  | 128 at 2595 MHz (16384 cores, 85.033 TFLOPs/s)             |
| Memory, Cache  | 24563 MB, 0 KB global / 48 KB local                        |
| Buffer Limits  | 6140 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Warning: error: unknown target CPU 'sm_89' Device NVIDIA GeForce RTX 4090   |
|          failed to build the program                                        |
| Error: OpenCL C code compilation failed with error code -11. Make sure      |
|        there are no errors in kernel.cpp.                                   |
'-----------------------------------------------------------------------------'

@Bossach
Copy link

Bossach commented Mar 2, 2024

@Tongzhao9417
Your LLVM doesn't know how to compile for your GPU
You can check supported ones by
$ clang --target=nvptx -print-supported-cpus
where --target=nvptx(nvptx64) stands for "nvidia architecture" and supported cpus are specific GPUs
Output:

Debian clang version 14.0.6
Target: nvptx
Thread model: posix
InstalledDir: /usr/bin
Available CPUs for this target:

        sm_20
        sm_21
        sm_30
        sm_32
        sm_35
        sm_37
        sm_50
        sm_52
        sm_53
        sm_60
        sm_61
        sm_62
        sm_70
        sm_72
        sm_75
        sm_80
        sm_86

Use -mcpu or -mtune to specify the target's processor.
For example, clang --target=aarch64-unknown-linux-gui -mcpu=cortex-a35

You need newer version of LLVM/clang. (Just checked llvm-16 from debian repo have "sm_89" one)
So $ sudo apt install llvm-16 clang-16 should fix your problem. Or most actual ones avalible on llvm.org repo
And you have to clean rebuild PoCL with option -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-16 (or your actual llvm-config path) in order to bond PoCL with correct LLVM version.

@Tongzhao9417
Copy link

@Tongzhao9417 Your LLVM doesn't know how to compile for your GPU You can check supported ones by $ clang --target=nvptx -print-supported-cpus where --target=nvptx(nvptx64) stands for "nvidia architecture" and supported cpus are specific GPUs Output:

Debian clang version 14.0.6
Target: nvptx
Thread model: posix
InstalledDir: /usr/bin
Available CPUs for this target:

        sm_20
        sm_21
        sm_30
        sm_32
        sm_35
        sm_37
        sm_50
        sm_52
        sm_53
        sm_60
        sm_61
        sm_62
        sm_70
        sm_72
        sm_75
        sm_80
        sm_86

Use -mcpu or -mtune to specify the target's processor.
For example, clang --target=aarch64-unknown-linux-gui -mcpu=cortex-a35

You need newer version of LLVM/clang. (Just checked llvm-16 from debian repo have "sm_89" one) So $ sudo apt install llvm-16 clang-16 should fix your problem. Or most actual ones avalible on llvm.org repo And you have to clean rebuild PoCL with option -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-16 (or your actual llvm-config path) in order to bond PoCL with correct LLVM version.

Sorry for late reply. I follow your step and it's worked for me.

Cheers!

@olympichek
Copy link

I compiled POCL as decribed above and now clinfo works. But when I try to run an OpenCL application I am getting an error:

 Build option -cl-std specified OpenCL C version 2.0,but device NVIDIA GeForce GTX 1080 Ti doesn't support that OpenCL C version

Does POCL not support OpenCL 2.0 ?

@monkeyden
Copy link

Absolute king. pocl-opencl-icd was the missing link for me. Ty, sir.

@CLRafaelR
Copy link

@Bossach

I really appreciate for your brilliant solution!

I want to ask one question to you and everyone who reacted to Bossach's comment and/or tried the solution (@husmen @joaomamede @Tongzhao9417 @olympichek @htao7 @kirse @kon332k): have you tried the PoCL verification tests for NIVIDIA GPU ../tools/scripts/run_cuda_tests as documented in NVIDIA GPU support — Portable Computing Language (PoCL) 6.0 documentation and have all of the test successfully passed?

I basically followed Bossach's steps to install PoCL and now have clinfo and clinfo -l functioning like a charm. However, I found four tests failed when I ran the PoCL verification test as shown below:

cd ~/pocl-6.0/build # move to my `build` directory
../tools/scripts/run_cuda_tests

# For rerunning the failed tests:
../tools/scripts/run_cuda_tests --rerun-failed --output-on-failure

Failed tests were:

The following tests FAILED:
          4 - kernel/test_as_type_loopvec (Failed)
        166 - regression/clSetKernelArg_overwriting_the_previous_kernel's_args_loopvec (Failed)
        208 - runtime/test_device_address (SEGFAULT)
        209 - runtime/test_svm (SEGFAULT)
Errors while running CTest

If anybody has conducted the verification test, could you please tell us whether you pass all tests or which tests you miss? It would be also very helpful if you could tell us about the runtime environment and settings, and configurations for PoCL installation.

I opend an issue on PoCL's repo ../tools/scripts/run_cuda_tests Fails on WSL2 · Issue #1533 · pocl/pocl. Comments on there are also appreciated, and such comments would be helpful for the developers of PoCL to know success/failure of the tests on WSL2 is reproducible and to enhance the PoCL.

@Shazway
Copy link

Shazway commented Oct 22, 2024

Hi,
I saw the POCL solution and jumped on the occasion to try fixing this issue but it didn't work for me.
After the step with cmake --build <build_dir> -j16 which worked fine, for the export of the variable OCL_ICD_VENDORS, there is no ocl-vendors folder
Result of ls in build:
CMakeCache.txt CTestCustom.cmake cl_offline_compiler.sh config.h kernellib_hash.h pocl_opencl.h CMakeFiles CTestTestfile.cmake cmake_install.cmake config2.h lib pocl_version.h CPackConfig.cmake Makefile compile_commands.json examples pocl.pc poclu CPackSourceConfig.cmake bin compile_test_. include pocl_build_timestamp.h tests

Result of clinfo:
Number of platforms 2
And it is too long to paste here but it sees two intel graphics platforms instead of one intel and one nvidia

Result of nvidia-smi
`
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.51.01 Driver Version: 565.90 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3050 ... On | 00000000:01:00.0 Off | N/A |
| N/A 73C P0 52W / 75W | 1453MiB / 4096MiB | 59% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
`
Any clues why ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests