Skip to content

Releases: ModelCloud/GPTQModel

GPTQModel v1.0.9

13 Oct 00:00
e6ac223
Compare
Choose a tag to compare

What's Changed

Fixed HF integration to work with latest transformers. Moved AutoRound to optional. Update flaky CI tests.

Full Changelog: v1.0.8...v1.0.9

GPTQModel v1.0.8

11 Oct 05:00
7b53f5c
Compare
Choose a tag to compare

What's Changed

Moved QBits to optional. Add Python 3.12 wheels and fix wheel generation for cuda 11.8.

Full Changelog: v1.0.7...v1.0.8

GPTQModel v1.0.7

08 Oct 14:19
e208d38
Compare
Choose a tag to compare

What's Changed

Fixed marlin (faster) kernel was not auto-selected for some models and autoround quantization save throwing json errors.

Full Changelog: v1.0.6...v1.0.7

GPTQModel v1.0.6

26 Sep 15:59
25e7313
Compare
Choose a tag to compare

What's Changed

Patch release to fix loading of quantized Llama 3.2 Vision model.

Full Changelog: v1.0.5...v1.0.6

GPTQModel v1.0.5

26 Sep 10:54
4921d68
Compare
Choose a tag to compare

What's Changed

Added partial quantization support Llama 3.2 Vision model. v1.0.5 allows quantization of text-layers (layers responsible for text-generation) only. We will add vision layer support shortly. A Llama 3.2 11B Vision Instruct models will quantize to 50% of the size in 4bit mode. Once vision layer support is added, the size will reduce to expected ~1/4.

Full Changelog: v1.0.4...v1.0.5

GPTQModel v1.0.4

26 Sep 04:26
cffee9a
Compare
Choose a tag to compare

What's Changed

Liger Kernel support added for ~50% vram reduction in quantization stage for some models. Added toggle to disable parallel packing to avoid oom larger models. Transformers depend updated to 4.45.0 for Llama 3.2 support.

Full Changelog: v1.0.3...v1.0.4

GPTQModel v1.0.3

19 Sep 06:36
44b9df7
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.0.2...v1.0.3

GPTQModel v1.0.2

17 Aug 01:44
182df2b
Compare
Choose a tag to compare

What's Changed

Upgrade the AutoRound package to v0.3.0. Pre-built WHL and PyPI source releases are now available. Installation can be done by downloading our pre-built WHL or using pip install gptqmodel --no-build-isolation.

Full Changelog: v1.0.0...v1.0.2

v1.0.0

14 Aug 00:29
4a028d5
Compare
Choose a tag to compare

What's Changed

40% faster multi-threaded packing, new lm_eval api, fixed python 3.9 compat.

Full Changelog: v0.9.11...v1.0.0

GPTQModel v0.9.11

09 Aug 10:33
f2fcdc8
Compare
Choose a tag to compare

What's Changed

Added LG EXAONE 3.0 model support. New dynamic per layer/module flexible quantization where each layer/module may have different bits/params. Added proper sharding support to backend.BITBLAS. Auto-heal quantization errors due to small damp values.

Full Changelog: v0.9.10...v0.9.11