Performance: Host overhead: Severe host overhead in sycl::get_kernel_bundle. #1016

fengyuan14 · 2024-10-23T08:13:55Z

🐛 Describe the bug

We are using kernel specific max work group size to avoid platform compatibility issue. The routine is,

  auto kid = ::sycl::get_kernel_id<KernelClass>();
  auto kbundle = ::sycl::get_kernel_bundle<::sycl::bundle_state::executable>(
      ctx, {dev}, {kid});
  sycl::kernel k = kbundle.get_kernel(kid);
  int max_work_group_size =  k.get_info<::sycl::info::kernel_device_specific::work_group_size>(dev);

sycl::get_kernel_bundles gets severe host overhead. The data is as below,

Impacts: All kernels in torch-xpu-ops launched with kernel specific max work group are impacted.

40us overhead is not acceptable for some single batch inference cases, since latency of kernels might be less than 10us.
CUDA runtime usually spends ~6us for a kernel launch.

intel/llvm#15824

Versions

torch-xpu-ops: latest main
Intel DPC++ compiler/rt: 2024.1.3 (2024.1.3.20240604)

The text was updated successfully, but these errors were encountered:

fengyuan14 assigned fengyuan14 and majing921201 Oct 23, 2024

fengyuan14 mentioned this issue Oct 23, 2024

SYCL runtime: Severe host overhead in sycl::get_kernel_bundle intel/llvm#15824

Open

Stonepia added the performance label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Host overhead: Severe host overhead in sycl::get_kernel_bundle. #1016

Performance: Host overhead: Severe host overhead in sycl::get_kernel_bundle. #1016

fengyuan14 commented Oct 23, 2024 •

edited

Loading

Performance: Host overhead: Severe host overhead in sycl::get_kernel_bundle. #1016

Performance: Host overhead: Severe host overhead in sycl::get_kernel_bundle. #1016

Comments

fengyuan14 commented Oct 23, 2024 • edited Loading

🐛 Describe the bug

Versions

fengyuan14 commented Oct 23, 2024 •

edited

Loading