[GPU] Add SLM support for FC bf tiled kernel #21435

sshlyapn · 2023-12-01T16:52:17Z

Details:

This patch implements SLM optimization for FC with compressed (INT4/UINT4) weights. The optimization is expected to improve processing of context sizes >= 241.

Added SLM optimization for FC bf_tiled kernel
Changed fake alignment rules for batches >= 256 for iGPU
Applied fake alignment to input tensors for FC (currently, it's implemented only for outputs, and causes out of bounds memory accesses of input buffer)
Implemented WA mechanism to have two precompiled shape agnostic kernels for FC (for small batches and large batches)
Fixed unaligned IFM leftovers processing
Added dump of all kernels' entry points in case of multiple-stage kernels instances

Tickets:

ticket-id

…and add decompression scale post op support

e-ddykim

I made a patch (sshlyapn#1) for model caching of dynamic models.
Please review and merge it into this PR.

…pes of kernels

added FullyConnected_bf_tiled::GetUpdateDispatchDataFunc

vladimir-paramuzov

No critical comments from my side.

vladimir-paramuzov · 2023-12-06T10:55:18Z

src/plugins/intel_gpu/src/graph/primitive_inst.cpp

+    auto updated_layout = actual_layout;
+    for (auto user : get_user_insts()) {
+        // Since fake alignment is applicable for input tensor as well, make sure we allocate enough memory
+        // to prevemt reading beyound the allocated memory bounds


nit: prevent, beyond

vladimir-paramuzov · 2023-12-06T11:01:53Z

...ns/intel_gpu/src/kernel_selector/kernels/fully_connected/fully_connected_kernel_bf_tiled.cpp

+        if (tparams.tile_ofm != required_tile_ofm)
+            return false;
+
+        if (params.weights.GetDType() != WeightsType::INT4 && params.weights.GetDType() != WeightsType::UINT4)


Do you have a check somewhere that available SLM size is big enough for given parameters? Haven't found one

The maximum possible size of SLM allocation of current implementation is 8KB and it looks like all current HW met this requirement. But I will add this condition for clarity, thanks

Added in PR #21555

vladimir-paramuzov · 2023-12-06T11:07:47Z

...ns/intel_gpu/src/kernel_selector/kernels/fully_connected/fully_connected_kernel_bf_tiled.cpp

+            auto dispatchData = SetDefault(prim_params, -1, execute_kernel_idx);
+            kd.kernels[execute_kernel_idx].params.workGroups.global = dispatchData.gws;
+            kd.kernels[execute_kernel_idx].params.workGroups.local = dispatchData.lws;
+            kd.kernels[execute_kernel_idx].skip_execution = KernelData::SkipKernelExecution(prim_params);


[offline discussion] That approach with multiple kernels for single KernelData doesn't look like a future proof solution, so proposed to do an experiment later with returning multiple KernelData objects from kernel selector which would mean that both kernels are suitable and should be dispatched based on some runtime check. It will likely require modification of primitive_impl sub-classes and adding of a generic wrapper for multiple primitive_impls + condition to switch between those.

vladimir-paramuzov · 2023-12-06T11:09:14Z

src/plugins/intel_gpu/src/graph/fully_connected.cpp

@@ -187,7 +186,14 @@ kernel_impl_params fully_connected_inst::get_fake_aligned_params(kernel_impl_par
            return std::move(orig_impl_param);
        }

-        size_t fake_align_base = (orig_impl_param.dev_type == cldnn::device_type::integrated_gpu) ? 16 : 8;
+        size_t fake_align_base = 8;


[random spot] Is this feature covered by existing test cases?

Yes, some of tests cover new implementation as well, but I will add more tests in follow up PR

Updated in PR #21555

* [GPU] Add SLM support for FC bf tiled kernel * Fix unaligned IFM leftovers processing in case of compressed weights and add decompression scale post op support * added FullyConnected_bf_tiled::GetUpdateDispatchDataFunc * updated FullyConnected_bf_tiled::GetUpdateDispatchDataFunc for two types of kernels --------- Co-authored-by: Kim, Eddy <[email protected]>

sshlyapn added category: GPU OpenVINO GPU plugin under_perf_check labels Dec 1, 2023

sshlyapn added this to the 2023.3 milestone Dec 1, 2023

sshlyapn requested review from a team as code owners December 1, 2023 16:52

sshlyapn force-pushed the fc_slm_optimization branch 5 times, most recently from f012c83 to e838b4c Compare December 5, 2023 12:02

[GPU] Add SLM support for FC bf tiled kernel

9f0c13f

sshlyapn force-pushed the fc_slm_optimization branch 3 times, most recently from 2eec4fd to c28adf9 Compare December 5, 2023 13:58

sshlyapn and others added 2 commits December 5, 2023 18:01

Fix unaligned IFM leftovers processing in case of compressed weights …

c28adf9

…and add decompression scale post op support

added FullyConnected_bf_tiled::GetUpdateDispatchDataFunc

d8f90d5

e-ddykim reviewed Dec 6, 2023

View reviewed changes

e-ddykim and others added 2 commits December 6, 2023 16:41

updated FullyConnected_bf_tiled::GetUpdateDispatchDataFunc for two ty…

b08d07e

…pes of kernels

Merge pull request #1 from e-ddykim/fc_slm_optimization_eddy

b849e5f

added FullyConnected_bf_tiled::GetUpdateDispatchDataFunc

e-ddykim approved these changes Dec 6, 2023

View reviewed changes

vladimir-paramuzov approved these changes Dec 6, 2023

View reviewed changes

p-durandin enabled auto-merge (squash) December 6, 2023 11:18

p-durandin merged commit 152b4df into openvinotoolkit:master Dec 6, 2023
56 checks passed

sshlyapn mentioned this pull request Dec 8, 2023

[GPU] Add more tests for FC fake alignment and FC SLM optimization #21555

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Add SLM support for FC bf tiled kernel #21435

[GPU] Add SLM support for FC bf tiled kernel #21435

sshlyapn commented Dec 1, 2023 •

edited

Loading

e-ddykim left a comment

vladimir-paramuzov left a comment

vladimir-paramuzov Dec 6, 2023

vladimir-paramuzov Dec 6, 2023

sshlyapn Dec 6, 2023

sshlyapn Dec 8, 2023

vladimir-paramuzov Dec 6, 2023

vladimir-paramuzov Dec 6, 2023

sshlyapn Dec 6, 2023

sshlyapn Dec 8, 2023

[GPU] Add SLM support for FC bf tiled kernel #21435

[GPU] Add SLM support for FC bf tiled kernel #21435

Conversation

sshlyapn commented Dec 1, 2023 • edited Loading

Details:

Tickets:

e-ddykim left a comment

Choose a reason for hiding this comment

vladimir-paramuzov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sshlyapn commented Dec 1, 2023 •

edited

Loading