v1.7.1 - Continuous batching feature supports ChatGLM2/3.
v1.7.1 - Continuous batching feature supports ChatGLM2/3.
Functionality
- Add continuous batching support of ChatGLM2/3 models.
- Qwen2Convert supports quantized Qwen2 models by GPTQ, such as GPTQ-Int8 and GPTQ-Int4, by param
from_quantized_model="gptq"
.
BUG fix
- Fixed the segament fault error when running with more than 2 ranks in vllm-xft serving.
What's Changed
Generated release nots
- [README] Update README.md. by @Duyi-Wang in #434
- [README] Update README.md. by @Duyi-Wang in #435
- [Common]Add INT8/UINT4 to BF16 weight convert by @xiangzez in #436
- Add Continue Batching support for Chatglm2/3 by @a3213105 in #438
- [Model] Add Qwen2 GPTQ model support by @xiangzez in #439
- [Model] Fix array out of bounds when rank > 2. by @Duyi-Wang in #441
- Bump gradio from 4.19.2 to 4.36.0 in /examples/web_demo by @dependabot in #442
- [Version] v1.7.1. by @Duyi-Wang in #445
Full Changelog: v1.7.0...v1.7.1