Releases: ModelCloud/GPTQModel
GPTQModel v1.0.9
What's Changed
Fixed HF integration to work with latest transformers. Moved AutoRound to optional. Update flaky CI tests.
- [FIX] mark auto_round extras_require by @LRL-ModelCloud in #430
- [BUILD] update compile flags by @Qubitium in #428
- [FIX] failed test_transformers_integration.py by @ZX-ModelCloud in #435
Full Changelog: v1.0.8...v1.0.9
GPTQModel v1.0.8
What's Changed
Moved QBits to optional. Add Python 3.12 wheels and fix wheel generation for cuda 11.8.
- [PKG] update vllm/sglang optional depends by @PZS-ModelCloud in #423
- [FIX] autoround depend causing torch-cpu to be installed by @Qubitium in #422
Full Changelog: v1.0.7...v1.0.8
GPTQModel v1.0.7
What's Changed
Fixed marlin (faster) kernel was not auto-selected for some models and autoround
quantization save throwing json errors.
- [FIX] marlin_inference_linear not correctly auto selected for eligible models by @ZX-ModelCloud in #413
- [FIX] remove "scale" and "zp" Tensor from layer_config by @ZX-ModelCloud in #414
- [FIX] Failed unit test by @ZX-ModelCloud in #420
Full Changelog: v1.0.6...v1.0.7
GPTQModel v1.0.6
What's Changed
Patch release to fix loading of quantized Llama 3.2 Vision model.
- [FIX] mllama loader by @LRL-ModelCloud in #404
Full Changelog: v1.0.5...v1.0.6
GPTQModel v1.0.5
What's Changed
Added partial quantization support Llama 3.2 Vision model. v1.0.5 allows quantization of text-layers (layers responsible for text-generation) only. We will add vision layer support shortly. A Llama 3.2 11B Vision Instruct models will quantize to 50% of the size in 4bit mode. Once vision layer support is added, the size will reduce to expected ~1/4.
- [MODEL] Add Llama 3.2 Vision (mllama)* support by @LRL-ModelCloud in #401
Full Changelog: v1.0.4...v1.0.5
GPTQModel v1.0.4
What's Changed
Liger Kernel support added for ~50% vram reduction in quantization stage for some models. Added toggle to disable parallel packing to avoid oom larger models. Transformers depend updated to 4.45.0 for Llama 3.2 support.
- [FEATURE] add a parallel_packing toggle by @LRL-ModelCloud in #393
- [FEATURE] add liger_kernel support by @LRL-ModelCloud in #394
Full Changelog: v1.0.3...v1.0.4
GPTQModel v1.0.3
What's Changed
- [MODEL] Add minicpm3 by @LDLINGLINGLING in #385
- [FIX] fix minicpm3 support by @LRL-ModelCloud in #387
- [MODEL] Added GRIN-MoE support by @LRL-ModelCloud in #388
New Contributors
- @LDLINGLINGLING made their first contribution in #385
- @mrT23 made their first contribution in #386
Full Changelog: v1.0.2...v1.0.3
GPTQModel v1.0.2
What's Changed
Upgrade the AutoRound package to v0.3.0. Pre-built WHL and PyPI source releases are now available. Installation can be done by downloading our pre-built WHL or using pip install gptqmodel --no-build-isolation
.
- [CORE] Autoround v0.3 by @LRL-ModelCloud in #368
- [CI] Lots of CI fixups by @CSY-ModelCloud
Full Changelog: v1.0.0...v1.0.2
v1.0.0
What's Changed
40% faster multi-threaded packing
, new lm_eval
api, fixed python 3.9 compat.
- Add
lm_eval
api by @PZS-ModelCloud in #338 - Multi-threaded
packing
in quantization by PZS-ModelCloud in #354 - [CI] Add TGI unit test by @PZS-ModelCloud in #348
- [CI] Updates by @CSY-ModelCloud in #347, #352, #353, #355, @CSY-ModelCloud in #357
- Fix python 3.9 compat by @PZS-ModelCloud in #358
Full Changelog: v0.9.11...v1.0.0
GPTQModel v0.9.11
What's Changed
Added LG EXAONE 3.0 model support. New dynamic per layer/module flexible quantization where each layer/module may have different bits/params. Added proper sharding support to backend.BITBLAS. Auto-heal quantization errors due to small damp values.
- [CORE] add support for pack and shard to bitblas by @LRL-ModelCloud in #316
- Add
dynamic
bits by @PZS-ModelCloud in #311, #319, #321, #323, #327 - [MISC] Adjust the validate order of QuantLinear when BACKEND is AUTO by @ZX-ModelCloud in #318
- add save_quantized log model total size by @PZS-ModelCloud in #320
- Auto damp recovery by @CSY-ModelCloud in #326
- [FIX] add missing original_infeatures by @CSY-ModelCloud in #337
- Update Transformers to 4.44.0 by @Qubitium in #336
- [MODEL] add exaone model support by @LRL-ModelCloud in #340
- [CI] Upload wheel to local server by @CSY-ModelCloud in #339
- [MISC] Fix assert by @CSY-ModelCloud in #342
Full Changelog: v0.9.10...v0.9.11