-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Cherry-pick] fix weight quant kernel bug when n div 64 != 0 #60184
Merged
XiaoguangHu01
merged 2 commits into
PaddlePaddle:release/2.6
from
wwbitejotunn:release_2.6_fix_weight_quant
Dec 26, 2023
Merged
[Cherry-pick] fix weight quant kernel bug when n div 64 != 0 #60184
XiaoguangHu01
merged 2 commits into
PaddlePaddle:release/2.6
from
wwbitejotunn:release_2.6_fix_weight_quant
Dec 26, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
你的PR提交成功,感谢你对开源项目的贡献! |
kolinwei
approved these changes
Dec 21, 2023
MARD1NO
approved these changes
Dec 21, 2023
heavengate
approved these changes
Dec 21, 2023
hanhaowen-mt
added a commit
to hanhaowen-mt/Paddle
that referenced
this pull request
May 13, 2024
…addlePaddle#60184)" This reverts commit 20d3558.
qili93
pushed a commit
that referenced
this pull request
May 13, 2024
* Revert "fix rpc_sync and rpc_async doc;test=develop (#64107)" This reverts commit 1319992. * Revert "[Dy2St][2.6] Disable `test_sentiment` on release/2.6 (#63197)" This reverts commit 9013831. * Revert "Revert "fix security (#62626) (#62683)" (#62890)" This reverts commit 89a60d7. * Revert "Enhance several unit tests (#62477) (#62776)" This reverts commit 0348f3f. * Revert "[Fix_ci] set PLUGIN_TAG release/2.6 (#62731)" This reverts commit 97ffa07. * Revert "fix security (#62626) (#62683)" This reverts commit 6a73547. * Revert "add more capi to support stride (#62716)" This reverts commit 683a141. * Revert "[XPU] default no autotune (#62636)" This reverts commit fde63d1. * Revert "[DCU] fix dcu compile failure (#62573)" This reverts commit d527fb5. * Revert "[AutoParallel] Adjust time restriction for test_semi_auto_parallel_hybrid_strategy.py (#62278)" This reverts commit fbf852d. * Revert "disable llm_int8 ut (#62282)" This reverts commit e816529. * Revert "fix openssl-cpu compile bug (#62079) (#62224)" This reverts commit 59c61db. * Revert "[CINN] Add IntrinsicOps into ir_codes_collector (#60556) (#62245)" This reverts commit 773ea41. * Revert "rm graph_reindex_test (#62057)" This reverts commit 521dc70. * Revert "fix (#61923) (#62186)" This reverts commit d077553. * Revert "fix cpups training bug:executor trainer use_ps_gpu value;test=develop (#62111)" This reverts commit d804975. * Revert "[cherry-pick 2.6] Fix bug of put_along_axis/take_along_axis (#62065)" This reverts commit 3a083c3. * Revert "[Cherry-pick] Fix indexing shape bug and Optimize (#62117)" This reverts commit 609f55e. * Revert "cherry pick: reduce log for type promotion. (#62116)" This reverts commit f4d9adf. * Revert "fix test_communicator_half_async random core;test=develop (#62092)" This reverts commit dba9992. * Revert "fix the unqiue op that generate the wrong the inreverse result (#62104)" This reverts commit b89066a. * Revert "[Cherry-pick] Fix Paddle-TRT UT fails (#61605)" This reverts commit 867ab0d. * Revert "fix se (#61640) (#61702)" This reverts commit c0f4a49. * Revert "fix dataloaer for toolkit (#61867) (#61994)" This reverts commit b50e906. * Revert "[Cherry-Pick] Fix CacheKV Quant Bug (#61966)" This reverts commit 04ac1c0. * Revert "[Paddle-TRT] fix solve (#61806)" This reverts commit df0155f. * Revert "fix launch when elastic run (#61847) (#61878)" This reverts commit f09d9d8. * Revert "Support Fake GroupWise Quant (#61900)" This reverts commit 2175de0. * Revert "repeat_interleave support bf16 dtype (#61854) (#61899)" This reverts commit 96c2aaf. * Revert "[security] refine _get_program_cache_key (#61827) (#61896)" This reverts commit b6a38d0. * Revert "merge (#61866)" This reverts commit 39010bf. * Revert "fix doc style (#61688)" This reverts commit 12e5c97. * Revert "fix layer_norm decompose dtyte bugs, polish codes (#61631)" This reverts commit e5a85b6. * Revert "remove _wget (#61356) (#61569)" This reverts commit 9250f66. * Revert "cinn(py-dsl): skip eval string in python-dsl (#61380) (#61586)" This reverts commit a37f6fb. * Revert "Fix unique (#60840) (#61044)" This reverts commit 3452e61. * Revert "[CherryPick] Fix issue 60092 (#61427)" This reverts commit f025385. * Revert "[cherry-pick] adapt c_embedding to phi namespace for custom devices (#60774) (#61045)" This reverts commit 0ccb9cb. * Revert "check eval for security (#61389)" This reverts commit 60325a1. * Revert "[Security] fix download security problem (#61162) (#61388)" This reverts commit 5f3bbeb. * Revert "[Security] fix security problem for run_cmd (#61285) (#61398)" This reverts commit 9cd0c91. * Revert "[Security] fix security problem for prune_by_memory_estimation (#61382)" This reverts commit af9b8c5. * Revert "Fix CVE-2024-0521 (#61032) (#61287)" This reverts commit f99d4f2. * Revert "fix _decompress security problem (#61294) (#61337)" This reverts commit 0227a0d. * Revert "[Security] fix draw security problem (#61161) (#61338)" This reverts commit aeaa0ca. * Revert "fix qat tests (#61211) (#61284)" This reverts commit ff119d0. * Revert "fix core dump when fallback gather_nd_grad and MemoryAllocateHost (#61067)" This reverts commit ac1702b. * Revert "[cherry-pick] This PR enable offset of generator for custom device. (#60616) (#60772)" This reverts commit 0f732a5. * Revert "[Cherry-pick] fix set_value with scalar grad (#60930)" This reverts commit 1aa5f4b. * Revert "[Dy2St][2.6] Increase `test_transformer` and `test_mobile_net` ut time (#60829) (#60875)" This reverts commit d788e9b. * Revert "[Dy2St][2.6] Disable `test_transformer` on `release/2.6` and update README (#60786)" This reverts commit e738f49. * Revert "fix bug of ci (#59926) (#60785)" This reverts commit 7b0d2e9. * Revert "[Dy2St][2.6] Disable `test_grad` on release/2.6 (#60662)" This reverts commit e50f43e. * Revert "[cherry-pick]update pdsa-2023-019 (#60649)" This reverts commit ccdf528. * Revert "[cherry-pick]fix fleetutil get_online_pass_interval bug3 (#60620)" This reverts commit bbc13eb. * Revert "fix fused_rope diff (#60217) (#60593)" This reverts commit 97b65c7. * Revert "fix fleetutil get_online_pass_interval bug2; test=develop (#60545)" This reverts commit ae2e588. * Revert "update 2023 security advisory, test=document_fix (#60532)" This reverts commit 83ce809. * Revert "add chunk allocator posix_memalign return value check (#60208) (#60495)" This reverts commit b065877. * Revert "tile (#60261)" This reverts commit 203754e. * Revert "[Cherry-pick] fix weight quant kernel bug when n div 64 != 0 (#60184)" This reverts commit 20d3558. * Revert "[Dy2St] Disable `test_bert` on CPU (#60173) (#60324)" This reverts commit a4cd847. * Revert "fix windows bug for common lib (#60308)" This reverts commit 1b696a1. * update to v2.6.0 * enable WITH_DISTRIBUTED in CMakeLists.txt and port related source file from cuda to musa * fix some bugs when WITH_DISTRIBUTED is enabled * delete useless cout in ../paddle/phi/backends/gpu/musa/musa_info.cc and set compute capacity to 9.9 for UT
xiaoguoguo626807
pushed a commit
that referenced
this pull request
Sep 30, 2024
* fix windows bug for common lib (#60308) * fix windows bug * fix windows bug * fix windows bug * fix windows bug * fix windows bug * fix windows bug * Update inference_lib.cmake * [Dy2St] Disable `test_bert` on CPU (#60173) (#60324) Co-authored-by: gouzil <[email protected]> * [Cherry-pick] fix weight quant kernel bug when n div 64 != 0 (#60184) * fix weight-only quant kernel error for n div 64 !=0 * code style fix * tile (#60261) * add chunk allocator posix_memalign return value check (#60208) (#60495) * fix chunk allocator posix_memalign return value check;test=develop * fix chunk allocator posix_memalign return value check;test=develop * fix chunk allocator posix_memalign return value check;test=develop * update 2023 security advisory, test=document_fix (#60532) * fix fleetutil get_online_pass_interval bug2; test=develop (#60545) * fix fused_rope diff (#60217) (#60593) * [cherry-pick]fix fleetutil get_online_pass_interval bug3 (#60620) * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * [cherry-pick]update pdsa-2023-019 (#60649) * update 2023 security advisory, test=document_fix * update pdsa-2023-019, test=document_fix * [Dy2St][2.6] Disable `test_grad` on release/2.6 (#60662) * fix bug of ci (#59926) (#60785) * [Dy2St][2.6] Disable `test_transformer` on `release/2.6` and update README (#60786) * [Dy2St][2.6] Disable `test_transformer` on release/2.6 and update README * [Docs] Update latest release version in README (#60691) * restore order * [Dy2St][2.6] Increase `test_transformer` and `test_mobile_net` ut time (#60829) (#60875) * [Cherry-pick] fix set_value with scalar grad (#60930) * Fix set value grad (#59034) * first fix the UT * fix set value grad * polish code * add static mode backward test * always has input valuetensor * add dygraph test * Fix shape error in combined-indexing setitem (#60447) * add ut * fix shape error in combine-indexing * fix ut * Set value with scalar (#60452) * set_value with scalar * fix ut * remove test_pir * remove one test since 2.6 not support uint8-add * [cherry-pick] This PR enable offset of generator for custom device. (#60616) (#60772) * fix core dump when fallback gather_nd_grad and MemoryAllocateHost (#61067) * fix qat tests (#61211) (#61284) * [Security] fix draw security problem (#61161) (#61338) * fix draw security problem * fix _decompress security problem (#61294) (#61337) * Fix CVE-2024-0521 (#61032) (#61287) This uses shlex for safe command parsing to fix arbitrary code injection Co-authored-by: ndren <[email protected]> * [Security] fix security problem for prune_by_memory_estimation (#61382) * OS Command Injection prune_by_memory_estimation fix * Fix StyleCode * [Security] fix security problem for run_cmd (#61285) (#61398) * fix security problem for run_cmd * [Security] fix download security problem (#61162) (#61388) * fix download security problem * check eval for security (#61389) * [cherry-pick] adapt c_embedding to phi namespace for custom devices (#60774) (#61045) Co-authored-by: Tian <[email protected]> * [CherryPick] Fix issue 60092 (#61427) * fix issue 60092 * update * update * update * Fix unique (#60840) (#61044) * fix unique kernel, row to num_out * cinn(py-dsl): skip eval string in python-dsl (#61380) (#61586) * remove _wget (#61356) (#61569) * remove _wget * remove _wget * remove wget test * fix layer_norm decompose dtyte bugs, polish codes (#61631) * fix doc style (#61688) * merge (#61866) * [security] refine _get_program_cache_key (#61827) (#61896) * security, refine _get_program_cache_key * repeat_interleave support bf16 dtype (#61854) (#61899) * repeat_interleave support bf16 dtype * support bf16 on cpu * Support Fake GroupWise Quant (#61900) * fix launch when elastic run (#61847) (#61878) * [Paddle-TRT] fix solve (#61806) * [Cherry-Pick] Fix CacheKV Quant Bug (#61966) * fix cachekv quant problem * add unittest * Sychronized the paddle2.4 adaptation changes * clear third_part dependencies * change submodules to right commits * build pass with cpu only * build success with maca * build success with cutlass and fused kernels * build with flash_attn and mccl * build with test, fix some bugs * fix some bugs * fixed some compilation bugs * fix bug in previous commit * fix bug with split when col_size biger than 256 * add row_limit to show full kernel name * add env.sh Change-Id: I6fded2761a44af952a4599691e19a1976bd9b9d1 * add shape record Change-Id: I273f5a5e97e2a31c1c8987ee1c3ce44a6acd6738 * modify paddle version Change-Id: I97384323c38066e22562a6fe8f44b245cbd68f98 * wuzhao optimized the performance of elementwise kernel. Change-Id: I607bc990415ab5ff7fb3337f628b3ac765d3186c * fix split when dtype is fp16 Change-Id: Ia55d31d11e6fa214d555326a553eaee3e928e597 * fix bug in previous commit Change-Id: I0fa66120160374da5a774ef2c04f133a54517069 * adapt flash_attn new capi Change-Id: Ic669be18daee9cecbc8542a14e02cdc4b8d429ba * change eigen path Change-Id: I514c0028e16d19a3084656cc9aa0838a115fc75c * modify mcname -> replaced_name Change-Id: Idc520d2db200ed5aa32da9573b19483d81a0fe9e * fix some build bugs Change-Id: I50067dfa3fcaa019b5736f4426df6d4e5f64107d * add PADDLE_ENABLE_SAME_RAND_A100 Change-Id: I2d4ab6ed0b5fac3568562860b0ba1c4f8e346c61 done * remove redundant warning, add patch from 2.6.1 Change-Id: I958d5bebdc68eb42fe433c76a3737330e00a72aa * improve VectorizedBroadcastKernel (cherry picked from commit 19069b26c0bf05a80cc834162db072f6b8aa2536) Change-Id: Iaf5719d72ab52adbedc40d4788c52eb1ce4d517c Signed-off-by: m00891 <[email protected]> * fix bugs (cherry picked from commit b007853a75dbd5de63028f4af82c15a5d3d81f7c) Change-Id: Iaec0418c384ad2c81c354ef09d81f3e9dfcf82f1 Signed-off-by: m00891 <[email protected]> * split ElementwiseDivGrad (cherry picked from commit eb6470406b7d440c135a3f7ff68fbed9494e9c1f) Change-Id: I60e8912be8f8d40ca83a54af1493adfa2962b2d6 Signed-off-by: m00891 <[email protected]> * in VectorizedElementwiseKernel, it can now use vecSize = 8 (cherry picked from commit a873000a6c3bc9e2540e178d460e74e15a3d4de5) Change-Id: Ia703b1e9e959558988fcd09182387da839d33922 Signed-off-by: m00891 <[email protected]> * improve ModulatedDeformableCol2imCoordGpuKernel:1.block size 512->64;2.FastDivMod;3.fix VL1;4.remove DmcnGetCoordinateWeight divergent branches. (cherry picked from commit 82c914bdd29f0eef87a52b229ff84bc456a1beeb) Change-Id: I60b1fa9a9c89ade25e6b057c38e08616a24fa5e3 Signed-off-by: m00891 <[email protected]> * Optimize depthwise_conv2d_grad compute (InputGrad): 1.use shared memory to optimize data load from global memory; 2.different blocksize for different input shape 3.FastDivMod for input shape div, >> and & for stride div. (cherry picked from commit b34a5634d848f3799f5a8bcf884731dba72d3b20) Change-Id: I0d8f22f2a2b9d99dc9fbfc1fb69b7bed66010229 Signed-off-by: m00891 <[email protected]> * improve VectorizedBroadcastKernel with LoadType = 2(kMixed) (cherry picked from commit 728b9547f65e096b45f39f096783d2bb49e8556f) Change-Id: I282dd8284a7cde54061780a22b397133303f51e5 Signed-off-by: m00891 <[email protected]> * fix ElementwiseDivGrad (cherry picked from commit 5f99c31904e94fd073bdd1696c3431cccaa376cb) Change-Id: I3ae0d6c01eec124d12fa226a002b10d0c40f820c Signed-off-by: m00891 <[email protected]> * Revert "Optimize depthwise_conv2d_grad compute (InputGrad):" This reverts commit b34a5634d848f3799f5a8bcf884731dba72d3b20. (cherry picked from commit 398f5cde81e2131ff7014edfe1d7beaaf806adbb) Change-Id: I637685b91860a7dea6df6cbba0ff2cf31363e766 Signed-off-by: m00891 <[email protected]> * improve ElementwiseDivGrad and ElementwiseMulGrad (cherry picked from commit fe32db418d8f075e083f31dca7010398636a6e67) Change-Id: I4f7e0f2b5afd4e704ffcd7258def63afc43eea9c Signed-off-by: m00891 <[email protected]> * improve FilterBBoxes (cherry picked from commit fe4655e86b92f5053fa886af49bf199307960a05) Change-Id: I35003420292359f8a41b19b7ca2cbaae17dc5b45 Signed-off-by: m00891 <[email protected]> * improve deformable_conv_grad op:1.adaptive block size;2.FastDivMod;3.move ldg up. (cherry picked from commit a7cb0ed275a3488f79445ef31456ab6560e9de43) Change-Id: Ia89df4e5a26de64baae4152837d2ce3076c56df1 Signed-off-by: m00891 <[email protected]> * improve ModulatedDeformableIm2colGpuKernel:1.adaptive block size;2.FastDivMod;3.move ldg up. (cherry picked from commit 4fb857655d09f55783d9445b91a2d953ed14d0b8) Change-Id: I7df7f3af7b4615e5e96d33b439e5276be6ddb732 Signed-off-by: m00891 <[email protected]> * improve KeBNBackwardData:replace 1.0/sqrt with rsqrt (cherry picked from commit 333cba7aca1edf7a0e87623a0e55e230cd1e9451) Change-Id: Ic808d42003677ed543621eb22a797f0ab7751baa Signed-off-by: m00891 <[email protected]> * Improve KeBNBackwardData, FilterGradAddupGpuKernel kernels. Improve nonzero and masked_select (forward only) OP. (cherry picked from commit c907b40eb3f9ded6ee751e522c2a97a353ac93bd) Change-Id: I7f4845405e64e7599134a8c497f464ac04dead88 Signed-off-by: m00891 <[email protected]> * Optimize depthwise_conv2d: 1. 256 Blocksize launch for small shape inputgrad; 2. FastDivMod in inputgrad and filtergrad; 3. shared memory to put output_grad_data in small shape. (cherry picked from commit f9f29bf7b8d929fb95eb1153a79d8a6b96d5b6d2) Change-Id: I1a3818201784031dbedc320286ea5f4802dbb6b1 Signed-off-by: m00891 <[email protected]> * Improve CheckFiniteAndUnscaleKernel by splitting the kernel into multiple tensors. (cherry picked from commit 3bd200f262271a333b3947326442b86af7fb6da1) Change-Id: I57c94cc5e709be8926e1b21da14b653cb18eabc3 Signed-off-by: m00891 <[email protected]> * Revert "Improve CheckFiniteAndUnscaleKernel by splitting the kernel into multiple tensors." This reverts commit 3bd200f262271a333b3947326442b86af7fb6da1. (cherry picked from commit 86ed8adaa8c20d3c824eecb0ee1e10d365bcea37) Change-Id: I5b8b7819fdf99255c65fe832d5d77f8e439bdecb Signed-off-by: m00891 <[email protected]> * improve ScatterInitCUDAKernel and ScatterCUDAKernel (cherry picked from commit cddb01a83411c45f68363248291c0c4685e60b24) Change-Id: Ie106ff8d65c21a8545c40636f021b73f3ad84587 Signed-off-by: m00891 <[email protected]> * fix bugs and make the code easier to read (cherry picked from commit 07ea3acf347fda434959c8c9cc3533c0686d1836) Change-Id: Id7a727fd18fac4a662f8af1bf6c6b5ebc6233c9f Signed-off-by: m00891 <[email protected]> * Optimize FilterGard and InputGradSpL Use tmp to store ldg data in the loop so calculate and ldg time can fold each other. (cherry picked from commit 7ddab49d868cdb6deb7c3e17c5ef9bbdbab86c3e) Change-Id: I46399594d1d7f76b78b9860e483716fdae8fc7d6 Signed-off-by: m00891 <[email protected]> * Improve CheckFiniteAndUnscaleKernel by putting address access to shared memory and making single thread do more tasks. (cherry picked from commit 631ffdda2847cda9562e591dc87b3f529a51a978) Change-Id: Ie9ffdd872ab06ff34d4daf3134d6744f5221e41e Signed-off-by: m00891 <[email protected]> * Optimize SwinTransformer 1.LayerNormBackward: remove if statement, now will always loop VPT times for ldg128 in compiler, bool flag to control if write action will be taken or not; 2.ContiguousCaseOneFunc: tmp saving division result for less division (cherry picked from commit 422d676507308d26f6107bed924424166aa350d3) Change-Id: I37aab7e2f97ae6b61c0f50ae4134f5eb1743d429 Signed-off-by: m00891 <[email protected]> * Optimize LayerNormBackwardComputeGradInputWithSmallFeatureSize Set BlockDim.z to make blockSize always be 512, each block can handle several batches. Then all threads will loop 4 times for better performance. (cherry picked from commit 7550c90ca29758952fde13eeea74857ece41908b) Change-Id: If24de87a0af19ee07e29ac2e7e237800f0181148 Signed-off-by: m00891 <[email protected]> * improve KeMatrixTopK:1.fix private memory;2.modify max grid size;3.change it to 64 warp reduce. (cherry picked from commit a346af182b139dfc7737e5f6473dc394b21635d7) Change-Id: I6c8d8105fd77947c662e6d22a0d15d7bad076bde Signed-off-by: m00891 <[email protected]> * Modify LayerNorm Optimization Might have lossdiff with old optimization without atomicAdd. (cherry picked from commit 80b0bcaa9a307c94dbeda658236fd75e104ccccc) Change-Id: I4a7c4ec2a0e885c2d581dcebc74464830dae7637 Signed-off-by: m00891 <[email protected]> * improve roi_align op:1.adaptive block size;2.FastDivMod. (cherry picked from commit cc421d7861c359740de0d2870abcfde4354d8c71) Change-Id: I55c049e951f93782af1c374331f44b521ed75dfe Signed-off-by: m00891 <[email protected]> * add workaround for parameters dislocation when calling BatchedGEMM<float16>. Change-Id: I5788c73a9c45f65e60ed5a88d16a473bbb888927 * fix McFlashAttn string Change-Id: I8b34f02958ddccb3467f639daaac8044022f3d34 * [C500-27046] fix wb issue Change-Id: I77730da567903f43ef7a9992925b90ed4ba179c7 * Support compiling external ops Change-Id: I1b7eb58e7959daff8660ce7889ba390cdfae0c1a * support flash attn varlen api and support arm build Change-Id: I94d422c969bdb83ad74262e03efe38ca85ffa673 * Add a copyright notice Change-Id: I8ece364d926596a40f42d973190525d9b8224d99 * Modify some third-party dependency addresses to public network addresses --------- Signed-off-by: m00891 <[email protected]> Co-authored-by: risemeup1 <[email protected]> Co-authored-by: Nyakku Shigure <[email protected]> Co-authored-by: gouzil <[email protected]> Co-authored-by: Wang Bojun <[email protected]> Co-authored-by: lizexu123 <[email protected]> Co-authored-by: danleifeng <[email protected]> Co-authored-by: Vigi Zhang <[email protected]> Co-authored-by: tianhaodongbd <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: JYChen <[email protected]> Co-authored-by: zhaohaixu <[email protected]> Co-authored-by: Spelling <[email protected]> Co-authored-by: zhouzj <[email protected]> Co-authored-by: wanghuancoder <[email protected]> Co-authored-by: ndren <[email protected]> Co-authored-by: Nguyen Cong Vinh <[email protected]> Co-authored-by: Ruibin Cheung <[email protected]> Co-authored-by: Tian <[email protected]> Co-authored-by: Yuanle Liu <[email protected]> Co-authored-by: zhuyipin <[email protected]> Co-authored-by: 6clc <[email protected]> Co-authored-by: Wenyu <[email protected]> Co-authored-by: Xianduo Li <[email protected]> Co-authored-by: Wang Xin <[email protected]> Co-authored-by: Chang Xu <[email protected]> Co-authored-by: wentao yu <[email protected]> Co-authored-by: zhink <[email protected]> Co-authored-by: handiz <[email protected]> Co-authored-by: zhimin Pan <[email protected]> Co-authored-by: m00891 <[email protected]> Co-authored-by: shuliu <[email protected]> Co-authored-by: Yanxin Zhou <[email protected]> Co-authored-by: Zhao Wu <[email protected]> Co-authored-by: m00932 <[email protected]> Co-authored-by: Fangzhou Feng <[email protected]> Co-authored-by: junwang <[email protected]> Co-authored-by: m01097 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Bug fixes
PR changes
OPs
Description
[Cherry-pick] fix weight quant kernel bug when n div 64 != 0
Pcard-71502