Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
mobicham committed Feb 20, 2024
1 parent 5bc48cb commit 6e3279d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ The quantization parameters are set as follows:
- ```quant_zero``` (bool): if True, it quantizes the zero-point to 8-bit without grouping.
- ```quant_scale``` (bool): if True, it quantizes the scaling factor to 8-bit with a group_size of 128.

Additionally, you can set ```offload_meta=True``` to offload the meta-data to the CPU. This dramatically decreases the GPU memory requirements but makes processing slightly slower for smaller group-sizes. With ```offload_meta=True```, you can run Llama2-70B and Mixtral with HQQ 2-bit using only 18.8GB and 13GB VRAM respectively!
Additionally, you can set ```offload_meta=True``` to offload the meta-data to the CPU. This drastically decreases the GPU memory requirements but makes processing slightly slower for smaller group-sizes. With ```offload_meta=True```, you can run Llama2-70B and Mixtral with HQQ 2-bit using only 18.8GB and 13GB VRAM respectively!

You can try to change the backend which could speed-up the runtime:
```Python
Expand Down

0 comments on commit 6e3279d

Please sign in to comment.