Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[GPU] Optimize iGPU FC with prime number batch size (openvinotoolkit#…
…24893) ### Details: Solve iGPU FC low performance issue when FC batch size is not aligned with 2/4 - Desc: Sometimes FC input shape is not aligned with 2/4, such as ViT models will adopt 257x4096 or 577x4096, in this unligned batch size, iGPU will perform FC very slowly, about 23ms for 257x4096->257x1024 and 50ms for 577x4096->577x1024. - Root cause: When FC's batch size is not aligned with 2/4, it will not choose best TuneParams and fallback to default parameters, which leads to worst performance. See blow figure: EU active is about 3.5% while XVE Thread occupancy almost is 100%, and global memory read bandwidth is 77 GB/s, which has reached hw bandwidth limitation (~75GB/s), it means that memory utilization in L3 cache is too low. ![image](https://github.com/openvinotoolkit/openvino/assets/31196718/a9debd4e-bc77-45ac-9942-01813b0d61ab) - Solution: If FC's bactch size is not aligned with 2/4, we still can use tile_b=16 with dispatch_bsv==1 as TuneParams, which can benefit from the higer ratio of GFLOPS and Data read bandwidth. - Test result on MTL: ![image](https://github.com/openvinotoolkit/openvino/assets/31196718/8c6b566c-8389-419f-836e-eaab29f8ef02) FC 257x4096->257x1024: latency improved from 23ms to 0.9ms <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=OneNote.File> <meta name=Generator content="Microsoft OneNote 15"> </head> <body lang=en-US style='font-family:Calibri;font-size:11.0pt'> <!--StartFragment--> <div style='direction:ltr'> | master | PR to opt -- | -- | -- CLIP visual | 0.99 FPS | 13.00 FPS ViT_B | 5.37 FPS | 20.40 FPS Vit_L | 0.56 FPS | 4.91 FPS </div> <!--EndFragment--> </body> </html> ### Tickets: - CVS-142833 --------- Co-authored-by: Chen Peter <[email protected]>
- Loading branch information