-
Notifications
You must be signed in to change notification settings - Fork 240
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Compress first and last layer more correctly (#2282)
### Changes - compress all embeddings to 8bit As result, more accurate compression scheme for several models: gpt-2, stable-diffusion-v1-5, stable-diffusion-v2-1, opt-6.7b - more correct search for the last layer when it shares weights with embedding As result, faster compression scheme is available for falcon-7b, bloomz-7b1, opt-6.7b ### Reason for changes Token embedding and last layer should be always compressed to 8-bit in order to preserve the accuracy. Previous logic for searching these layers relied on the topological sort, but on the practice the order can be changed. As results, at least for 3 models in mixed-precision setup positional embedding was quantized to 8 bit, but token embedding to 4 bit, which is not expected. Moreover, the last layer can share weight with embedding. With the old logic this case was not correctly handled: one extra matmul was quantized to 8-bit. ### Related tickets 125162 ### Tests test_shared_gather The accuracy should be better, the performance is not significantly affected. opt-125m, lamdada_openai | ppl | ms/token -- | -- | -- All embedding and last layer - 8bit | 29.12 | 11.47 Positional embedding - 8 bit <br /> Token embedding and last layer - 4 bit | 29.20 | 11.53 Positional embedding - 4 bit <br /> Token embedding and last layer - 8 bit | 29.59 | 11.57
- Loading branch information
1 parent
8a45acb
commit 5eee3bc
Showing
6 changed files
with
267 additions
and
339 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.