Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert with Xe KMD when using -DNEO_ENABLE_XE_DRM_DETECTION=TRUE #696

Closed
eero-t opened this issue Jan 9, 2024 · 13 comments
Closed

Assert with Xe KMD when using -DNEO_ENABLE_XE_DRM_DETECTION=TRUE #696

eero-t opened this issue Jan 9, 2024 · 13 comments
Labels
L0 Sysman Issue related to L0 Sysman

Comments

@eero-t
Copy link

eero-t commented Jan 9, 2024

Problem

Compute-runtime Xe KMD support does not actually work with Xe KMD, it asserts

Details

When building kernel from Xe repo default "drm-xe-next" branch (yesterday HEAD commit): https://gitlab.freedesktop.org/drm/xe/kernel

With Xe driver enabled:

# grep _XE[^A-Z] /boot/drm_xe.config 
CONFIG_DRM_XE=m
CONFIG_DRM_XE_FORCE_PROBE=""
CONFIG_DRM_XE_JOB_TIMEOUT_MAX=10000
CONFIG_DRM_XE_JOB_TIMEOUT_MIN=1
CONFIG_DRM_XE_TIMESLICE_MAX=10000000
CONFIG_DRM_XE_TIMESLICE_MIN=1
CONFIG_DRM_XE_PREEMPT_TIMEOUT=640000
CONFIG_DRM_XE_PREEMPT_TIMEOUT_MAX=10000000
CONFIG_DRM_XE_PREEMPT_TIMEOUT_MIN=1
CONFIG_DRM_XE_ENABLE_SCHEDTIMEOUT_LIMIT=y

Booting TGL device with it being enabled:

# dmesg | grep xe[^a-z]
[    0.000000] Command line: BOOT_IMAGE=/boot/drm_xe rootwait fsck.repair=yes i915.force_probe=!9a60 xe.force_probe=9a60 ro
[    0.038111] Kernel command line: BOOT_IMAGE=/boot/drm_xe rootwait fsck.repair=yes i915.force_probe=!9a60 xe.force_probe=9a60 ro
[    3.068875] xe 0000:00:02.0: vgaarb: deactivate vga console
[    3.198711] xe 0000:00:02.0: [drm] Using GuC firmware from i915/tgl_guc_70.bin version 70.13.1
[    3.202558] xe 0000:00:02.0: [drm] Using HuC firmware from i915/tgl_huc.bin version 7.9.3
[    3.204943] xe REG[0x2340-0x235f]: allow read access
[    3.204946] xe REG[0x7010-0x7017]: allow rw access
[    3.204947] xe REG[0x7018-0x701f]: allow rw access
[    3.204974] xe REG[0x223a8-0x223af]: allow read access
[    3.204993] xe REG[0x1c03a8-0x1c03af]: allow read access
[    3.205011] xe REG[0x1d03a8-0x1d03af]: allow read access
[    3.205030] xe REG[0x1c83a8-0x1c83af]: allow read access
[    3.212040] [drm] Initialized xe 1.1.0 20201103 for 0000:00:02.0 on minor 0
[    4.462524] xe 0000:00:02.0: [drm] GT0: suspended

And using compute stack built from following versions:

  • GMMlib: intel-gmmlib-22.3.16
  • SPIRV-SDK: vulkan-sdk-1.3.268.0/vulkan-sdk-1.3.268.0 (headers/tools)
  • SPIRV-LLVM: libllvmspirvlib-14-dev:amd64:14.0.0-3ubuntu1 (Ubuntu package)
  • OpenCL-Clang: libopencl-clang-14-dev:amd64:14.0.0-4 (Ubuntu package)
  • VC-intrinsics: v0.16.0
  • Graphics Compiler: igc-1.0.15985.0 (IGC)
  • Level-Zero API: v1.15.8
  • compute-runtime: 23.48.27912.9

Using options enabling Xe KMD support:

ARG ZELLO_LOC=../level_zero/tools/test/black_box_tests/zello_sysman.cpp
RUN cd compute-runtime  &&  mkdir build  &&  cd build  &&  \
    cmake -LH -Wno-dev -G Ninja \
      -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR} -DCMAKE_BUILD_TYPE=Release \
      -DSUPPORT_GEN8=0 -DSUPPORT_GEN9=1 -DSUPPORT_GEN11=0 \
      -DSUPPORT_TGLLP=1 -DSUPPORT_DG1=1 -DSUPPORT_XE_HP_SDV=1 \
      -DSUPPORT_DG2=1 -DSUPPORT_PVC=1 \
      -DNEO_ENABLE_i915_PRELIM_DETECTION=TRUE \
      -DNEO_ENABLE_XE_DRM_DETECTION=TRUE \
      -DNEO_DISABLE_LD_GOLD=1 \
      -DDO_NOT_RUN_AUB_TESTS=1 -DDONT_CARE_OF_VIRTUALS=1 \
      ../  && \
    ninja  &&  ninja install  && \
    g++ -O2 -Wall -o ${INSTALL_DIR}/bin/zello_sysman $ZELLO_LOC -lze_loader -locloc

Compute-runtime and its zello_sysman tool just abort with an assert:

# docker run -it --rm --user root --network none --cap-drop ALL  --device /dev/dri:/dev/dri:rw registry/compute-tester:latest zello_sysman
ZES_ENABLE_SYSMAN environment variable Not Set
Setting the environment variable ZES_ENABLE_SYSMAN 
ZES_ENABLE_SYSMAN environment variable Set
Abort was called at 311 line in file:
/source/compute-runtime/shared/source/os_interface/linux/xe/ioctl_helper_xe.cpp
@eero-t
Copy link
Author

eero-t commented Jan 9, 2024

OpenCL programs give also same assert, which is here in the repo code:
https://github.com/intel/compute-runtime/blob/23.48.27912.9/shared/source/os_interface/linux/xe/ioctl_helper_xe.cpp#L311

Strace shows this memory region check issue happening at driver init time:

# ... strace -f -k zello_sysman
...
write(1, "Abort was called at 311 line in "..., 38Abort was called at 311 line in file:
) = 38
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__write+0x14) [0x10bf34]
...
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__printf_chk+0xab) [0x12d63b]
 > /usr/local/lib/libze_intel_gpu.so.1.3.27912(zeKernelSuggestGroupSizeTracing+0x10e822) [0x3463b2]
... 
> /usr/local/lib/libze_intel_gpu.so.1.3.27912(zeKernelSuggestGroupSizeTracing+0x36949d) [0x5a102d]
 > /usr/local/lib/libze_intel_gpu.so.1.3.27912(zetGetMetricGroupExpProcAddrTable+0x22e86) [0x11b506]
 > /usr/local/lib/libze_intel_gpu.so.1.3.27912(zetGetMetricGroupExpProcAddrTable+0x227af) [0x11ae2f]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_mutexattr_settype+0x107) [0x94817]
 > /usr/local/lib/libze_intel_gpu.so.1.3.27912(zetGetMetricGroupExpProcAddrTable+0x22a38) [0x11b0b8]
 > /usr/local/lib/libze_tracing_layer.so.1.15.8(zeGetFabricVertexExpProcAddrTable+0xdc5) [0xe835]
 > /usr/local/lib/libze_loader.so.1.15.8(loader::context_t::init_driver(loader::driver_t, unsigned int)+0x61d) [0x1f9bd]
 > /usr/local/lib/libze_loader.so.1.15.8(loader::context_t::check_drivers(unsigned int)+0x126) [0x219e6]
 > /usr/local/lib/libze_loader.so.1.15.8(ze_lib::context_t::~context_t()+0xc0) [0x1a170]
 > /usr/local/lib/libze_loader.so.1.15.8(loader::createLoaderContext()+0x174) [0x117a4]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_mutexattr_settype+0x107) [0x94817]
 > /usr/local/lib/libze_loader.so.1.15.8(zeInit+0x73) [0x11853]
 > /usr/local/bin/zello_sysman() [0xa658]

@eero-t eero-t changed the title Assert fail with Xe KMD Assert / segfault with Xe KMD with -DNEO_ENABLE_XE_DRM_DETECTION=TRUE Jan 11, 2024
@eero-t eero-t changed the title Assert / segfault with Xe KMD with -DNEO_ENABLE_XE_DRM_DETECTION=TRUE Assert / segfault with Xe KMD when using -DNEO_ENABLE_XE_DRM_DETECTION=TRUE Jan 11, 2024
@eero-t eero-t changed the title Assert / segfault with Xe KMD when using -DNEO_ENABLE_XE_DRM_DETECTION=TRUE Assert with Xe KMD when using -DNEO_ENABLE_XE_DRM_DETECTION=TRUE Jan 11, 2024
@eero-t
Copy link
Author

eero-t commented Jan 11, 2024

On Arc, I've seen also segfault instead of assert, but it was not reproducible. Strace showed it happening with same backtrace as the assert.

With OpenCL, strace shows line 311 assert being arrived through another route than in above zello_sysman L0 backend backtrace:

ioctl(4, _IOC(_IOC_READ|_IOC_WRITE, 0x64, 0x40, 0x28), 0x7ffe75cd81f0) = 0
 > /usr/lib/x86_64-linux-gnu/libc.so.6(ioctl+0x3f) [0x111f3f]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x505bba) [0x5ccb0a]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x51d7b7) [0x5e4707]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x51a22f) [0x5e117f]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x4fc8f5) [0x5c3845]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x1be9b) [0xe2deb]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x5105b0) [0x5d7500]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x464df7) [0x52bd47]
 > /usr/local/lib/intel-opencl/libigdrcl.so() [0x9b121]
 > /usr/local/lib/intel-opencl/libigdrcl.so() [0x9b2ae]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x46504d) [0x52bf9d]
 > /usr/local/lib/intel-opencl/libigdrcl.so(clGetExtensionFunctionAddress+0x5a6b) [0xbf7fb]
 > /usr/local/lib/intel-opencl/libigdrcl.so(clIcdGetPlatformIDsKHR+0x27) [0xbfe27]
 > /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0() [0x7f64]
 > /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0(clGetPlatformIDs+0xbb) [0x8f6b]
 > /usr/bin/clinfo() [0x97cc]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x90) [0x23a90]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x89) [0x23b49]
 > /usr/bin/clinfo() [0xc645]
newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}, AT_EMPTY_PATH) = 0
 > /usr/lib/x86_64-linux-gnu/libc.so.6(fstatat+0xe) [0x10b42e]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(_IO_file_doallocate+0x63) [0x78603]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(_IO_doallocbuf+0x50) [0x885b0]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(_IO_file_overflow+0x180) [0x87510]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(_IO_file_xsputn+0x105) [0x85ce5]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(parse_printf_format+0x969) [0x56929]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(parse_printf_format+0x605) [0x565c5]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(parse_printf_format+0xbcc) [0x56b8c]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0x24e5) [0x5ece5]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0x4341) [0x60b41]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__printf_chk+0xab) [0x12d63b]
 > /usr/local/lib/intel-opencl/libigdrcl.so() [0x9b582]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x51a555) [0x5e14a5]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x4fc8f5) [0x5c3845]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x1be9b) [0xe2deb]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x5105b0) [0x5d7500]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x464df7) [0x52bd47]
 > /usr/local/lib/intel-opencl/libigdrcl.so() [0x9b121]
 > /usr/local/lib/intel-opencl/libigdrcl.so() [0x9b2ae]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x46504d) [0x52bf9d]
 > /usr/local/lib/intel-opencl/libigdrcl.so(clGetExtensionFunctionAddress+0x5a6b) [0xbf7fb]
 > /usr/local/lib/intel-opencl/libigdrcl.so(clIcdGetPlatformIDsKHR+0x27) [0xbfe27]
 > /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0() [0x7f64]
 > /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0(clGetPlatformIDs+0xbb) [0x8f6b]
 > /usr/bin/clinfo() [0x97cc]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x90) [0x23a90]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x89) [0x23b49]
 > /usr/bin/clinfo() [0xc645]
write(1, "Abort was called at 311 line in "..., 38Abort was called at 311 line in file:
) = 38

Mesa driver works fine with this (last night) Xe KMD git version.

@eero-t
Copy link
Author

eero-t commented Jan 12, 2024

Tried also older (Dec 21st) Xe KMD version recommended for media-driver in intel/media-driver#1761

But compute-runtime tags 23.48.27912.9 and earlier series 23.43.27642.21 one (using older Xe uAPI I think), still fail at init with it:

$ NEOReadDebugKeys=1 PrintDebugSettings=1 PrintDebugMessages=1 zello_sysman
ZES_ENABLE_SYSMAN environment variable Not Set
Setting the environment variable ZES_ENABLE_SYSMAN 
ZES_ENABLE_SYSMAN environment variable Set
Non-default value of debug variable: PrintDebugSettings = 1
Non-default value of debug variable: PrintDebugMessages = 1
IoctlHelperXe::IoctlHelperXe
IoctlHelperXe::initialize
 -> IoctlHelperXe::getIoctlRequestValue 0xe
 -> IoctlHelperXe::getIoctlRequestValue 0xe
DRM_XE_QUERY_CONFIG_REV_AND_DEVICE_ID	0x19a60
  REV_ID				0x1
  DEVICE_ID				0x9a60
DRM_XE_QUERY_CONFIG_FLAGS			0
  DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM	OFF
DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT		0x1000
DRM_XE_QUERY_CONFIG_VA_BITS		0x30
 -> IoctlHelperXe::getIoctlRequestValue 0xe
 -> IoctlHelperXe::getIoctlRequestValue 0xe
 -> IoctlHelperXe::getIoctlRequestValue 0xe
 -> IoctlHelperXe::getIoctlRequestValue 0xe
 -> IoctlHelperXe::getIoctlRequestValue 0xe
 -> IoctlHelperXe::getIoctlRequestValue 0xe
 -> IoctlHelperXe::getDrmParamValue 0x26 QueryHwconfigTable
 => IoctlHelperXe::ioctl 0xe
 -> IoctlHelperXe::ioctl Query id=0x26 f=0x0 len=0 r=0
INFO: System Info query failed!
 -> IoctlHelperXe::getDrmParamValue 0x1b ParamHasExecSoftpin
 => IoctlHelperXe::ioctl 0x3
 -> IoctlHelperXe::ioctl Getparam 0x1b/0x1 r=0
 => IoctlHelperXe::ioctl 0xd
 -> IoctlHelperXe::ioctl GemContextSetparam r=0
 -> IoctlHelperXe::getIoctlRequestValue 0xe
 -> IoctlHelperXe::getIoctlRequestValue 0xe
 -> IoctlHelperXe::getIoctlRequestValue 0xe
 -> IoctlHelperXe::getIoctlRequestValue 0xe
Abort was called at 311 line in file:
/home/nobody/source/compute-runtime/shared/source/os_interface/linux/xe/ioctl_helper_xe.cpp
Aborted (core dumped)

Latest Mesa tag works with Xe KMD HEAD, and the linked media-driver bug tells the working combo for media.

So, what Xe KMD version compute-runtime needs?

@eero-t
Copy link
Author

eero-t commented Feb 1, 2024

As latest "compute-runtime" tag (23.52.28202.14) included some Xe KMD uAPI support updates (08f7e7b), I built latest of everything, and tried it with latest Xe KMD drm-xe-next upstreaming tag drm-xe-next-fixes-2024-01-16.

Although latest Mesa (release) and media-driver (master) now both work with that Xe KMD tag (without any additional patches), "compute-runtime" still aborts:

# strace -f -k clinfo
...
write(1, "Abort was called at 509 line in "..., 38Abort was called at 509 line in file:
) = 38
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__write+0x14) [0x10bf34]
...
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__printf_chk+0xab) [0x12d63b]
 > /usr/local/lib/intel-opencl/libigdrcl.so() [0x9e552]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x52805f) [0x5f24cf]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x52c71c) [0x5f6b8c]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x50ccce) [0x5d713e]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x1f1c6) [0xe9636]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x520290) [0x5ea700]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x4743c7) [0x53e837]
 > /usr/local/lib/intel-opencl/libigdrcl.so() [0x9e0f1]
 > /usr/local/lib/intel-opencl/libigdrcl.so() [0x9e27e]
 > /usr/local/lib/intel-opencl/libigdrcl.so(GTPin_Init+0x47461d) [0x53ea8d]
 > /usr/local/lib/intel-opencl/libigdrcl.so(clGetExtensionFunctionAddress+0x5a7b) [0xc2d2b]
 > /usr/local/lib/intel-opencl/libigdrcl.so(clIcdGetPlatformIDsKHR+0x27) [0xc3357]
 > /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0() [0x7f64]
 > /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0(clGetPlatformIDs+0xbb) [0x8f6b]
 > /usr/bin/clinfo() [0x97cc]

With what Xe KMD version, patches etc compute-runtime is supposed to work with? And which compute-runtime version, patches etc. I should use?

@JablonskiMateusz
Copy link
Contributor

Hi @eero-t
Could you try to build NEO as of 278ced3 ?

@eero-t
Copy link
Author

eero-t commented Feb 13, 2024

@JablonskiMateusz That commit seems to be only in master branch, not yet in any of the tagged versions:

$ git branch --contains 278ced3
* master

Similarly to media-driver, master build of compute-runtime does work with Xe KMD!

Actually, both of the drivers work with both of the KMD versions from f.d.o:

  • drm-tip (drm integration) repo HEAD, and
  • drm/xe/kernel (Xe devel) repo drm-xe-next-fixes-2024-01-16 tag

However, while basic CL stuff seems to work, all Sysman metric queries return ZE_RESULT_ERROR_UNINITIALIZED (according to zello_sysman), at least on TGL iGPU.

Is there something I need to use to get at least some Sysman metrics to work, or is Xe KMD still lacking all metric support?

PS. I think this ticket should be open until:

  • some tagged commit includes all the necessary Xe KMD support commit(s), and
  • there's a README stating the Xe KMD commit/tag needed by that support [1]

[1] corresponding media-driver README: https://github.com/intel/media-driver/blob/master/media_softlet/linux/common/os/xe/include/README.md

@eero-t
Copy link
Author

eero-t commented Feb 22, 2024

However, while basic CL stuff seems to work, all Sysman metric queries return ZE_RESULT_ERROR_UNINITIALIZED (according to zello_sysman), at least on TGL iGPU.

Is there something I need to use to get at least some Sysman metrics to work, or is Xe KMD still lacking all metric support?

With ZELLO_SYSMAN_USE_ZESINIT=1 env var, zello_sysman reports frequency metrics for TGL iGPU with xe KMD.

(I.e. Sysman supports xe KMD only when zesInit() is used for initializing it instead of zeInit().)

However, when querying engine metrics, there's a segfault:

# ZELLO_SYSMAN_USE_ZESINIT=1 strace -f zello_sysman -e
...
write(1, " ----  Engine tests ---- \n", 26 ----  Engine tests ---- 
) = 26
futex(0x5650fe3c3eb8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/sys/class/drm/card0/device/vendor", O_RDONLY) = 3
read(3, "0x8086\n", 8191)               = 7
close(3)                                = 0
openat(AT_FDCWD, "/sys/module/i915/agama_version", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/module/i915/srcversion", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/class/drm/card0/device/subsystem_vendor", O_RDONLY) = 3
read(3, "0x8086\n", 8191)               = 7
close(3)                                = 0
write(1, "Device UUID: 0 0 0 0 0 0 0 0 0 0"..., 46Device UUID: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
) = 46
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV (core dumped) +++

Those 2 metrics types are only ones compute-runtime supports for iGPUs, but once that segfault is fixed, I'll try also the other xe provided Sysman metrics on some dGPU.

@saik-intel
Copy link
Contributor

@eero-t we are looking into this and update you when fix is ready

@eero-t
Copy link
Author

eero-t commented Feb 29, 2024

However, when querying engine metrics, there's a segfault:

Segfault on engine metrics query is specific to "zello_sysman" (built from same 2024-02-09 master branch sources as driver itself).

There's no crash with my own zesInit() using program with Xe KMD, engine metrics just do not work: #707

@JablonskiMateusz JablonskiMateusz added the L0 Sysman Issue related to L0 Sysman label Mar 4, 2024
@eero-t
Copy link
Author

eero-t commented Mar 4, 2024

Tried latest Xe KMD (6.8.0-rc3) tags:

Because latest "24.05.28454.10" release is still missing reguired 278ced3 commit, I built again latest compute-runtime master.

In quick testing, driver build seemed to work OK with "drm-xe-next-2024-02-25" one, except for missing engine metrics regression, that happens also with i915, and zello_sysman crash, discussed above.

As to "drm-xe-fixes-2024-02-29" Xe KMD, OpenCL read/write/copy tester hanged both on TGL iGPU and Arc. When stracing the tester, it was either using 100% by constantly sched_yield()ing (TGL), or nanosleeping (Arc). For now, I'm assuming driver is not even supposed to work with that Xe KMD version...

@saik-intel
Copy link
Contributor

with new release it is fixed, please close

@eero-t
Copy link
Author

eero-t commented Apr 24, 2024

with new release it is fixed, please close

@saik-intel Haven't yet had time to verify latest release functionality. I'll try to do it before end of week.

@eero-t
Copy link
Author

eero-t commented Apr 25, 2024

Closing. On quick testing (zello_sysman + cl-mem), latest release works both with Xe KMD repo "drm-xe-next-2024-02-25" tag, and last night "drm-tip" HEAD kernels.

@eero-t eero-t closed this as completed Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L0 Sysman Issue related to L0 Sysman
Projects
None yet
Development

No branches or pull requests

3 participants