v1.7.1 - Continuous batching feature supports ChatGLM2/3.

Duyi-Wang released this 12 Jun 05:27

· 51 commits to main since this release

v1.7.1 - Continuous batching feature supports ChatGLM2/3.

Functionality

Add continuous batching support of ChatGLM2/3 models.
Qwen2Convert supports quantized Qwen2 models by GPTQ, such as GPTQ-Int8 and GPTQ-Int4, by param from_quantized_model="gptq".

BUG fix

Fixed the segament fault error when running with more than 2 ranks in vllm-xft serving.

What's Changed

Generated release nots

[README] Update README.md. by @Duyi-Wang in #434
[README] Update README.md. by @Duyi-Wang in #435
[Common]Add INT8/UINT4 to BF16 weight convert by @xiangzez in #436
Add Continue Batching support for Chatglm2/3 by @a3213105 in #438
[Model] Add Qwen2 GPTQ model support by @xiangzez in #439
[Model] Fix array out of bounds when rank > 2. by @Duyi-Wang in #441
Bump gradio from 4.19.2 to 4.36.0 in /examples/web_demo by @dependabot in #442
[Version] v1.7.1. by @Duyi-Wang in #445

Full Changelog: v1.7.0...v1.7.1

Contributors

a3213105, dependabot, and 2 other contributors

Assets 2