Fix llm.int8 unit test #61591

RichardWooSJTU · 2024-02-04T10:29:54Z

Bug fixes

OPs

Background:
test_llm_int8_linear.py failed

Reason:
weight_quantize op has been modified to only support scale dtype which is the same as input data, while llm.int8 op is not supported.

Solution:
Adapt weight_quantize op to satisfy llm.int8
Pcard-71502

wwbitejotunn

LGTM for weight_only gemm scale type

jeng1220 · 2024-02-27T07:21:17Z

@RichardWooSJTU ,
Could you please cherry-pick this patch to release/2.6 branch?

cc @onecatcn for vis

RichardWooSJTU added 2 commits February 4, 2024 18:12

fix llm.int8 unit test

c90939f

fix llm.int8 unnittest when cpu

25551c0

This was referenced Feb 22, 2024

PaddlePaddle 2.6.0 buglist, part 1 #60882

Closed

change GPU memory allocating policy #6159

Merged

RichardWooSJTU added 2 commits February 22, 2024 22:14

fix numerical mismatch

bf17886

code clean

193bdc5

wwbitejotunn approved these changes Feb 23, 2024

View reviewed changes

risemeup1 approved these changes Feb 26, 2024

View reviewed changes

carryyu merged commit 194ef8b into PaddlePaddle:develop Feb 26, 2024
30 checks passed

RichardWooSJTU mentioned this pull request Mar 1, 2024

Disable unit test of llm_int8_linear op #62282

Merged

Provide feedback