Add support for quantized OPT models and refactor #295

Zerogoki00 · 2023-03-13T17:17:45Z

Added ability to use quantized OPT models
Added argument to specify quantized model type (LLaMA by default)
Removed --load-in-4bit because we already have --gptq-bits

Tested with OPT-30B-Erebus on RTX 4090. Works slower than LLaMA but it works

Ph0rk0z · 2023-03-13T17:31:14Z

How is performance vs flexgen?

oobabooga · 2023-03-13T17:48:38Z

modules/quant_loader.py


    path_to_model = Path(f'models/{model_name}')
-    pt_model = ''
-    if path_to_model.name.lower().startswith('llama-7b'):


Removing this block breaks compatibility with folder names like llama-7b-hf by decapoda.

I moved the code back

oobabooga · 2023-03-13T17:49:55Z

modules/models.py

 def load_model(model_name):
    print(f"Loading {model_name}...")
    t0 = time.time()

    shared.is_RWKV = model_name.lower().startswith('rwkv-')

    # Default settings
-    if not any([shared.args.cpu, shared.args.load_in_8bit, shared.args.load_in_4bit, shared.args.gptq_bits > 0, shared.args.auto_devices, shared.args.disk, shared.args.gpu_memory is not None, shared.args.cpu_memory is not None, shared.args.deepspeed, shared.args.flexgen, shared.is_RWKV]):


I'm reluctant to remove --load-in-4bit because that will certainly cause confusion, but I guess we can do it and move on.

It's up to you but I think that it's a bad practice to keep multiple arguments which do the same thing

Also this --load-in-4bit argument was confusing because it works differently from --load-in-8bit

You have convinced me, let's ditch --load-in-4bit.

oobabooga · 2023-03-13T17:51:13Z

modules/shared.py

-parser.add_argument('--load-in-4bit', action='store_true', help='Load the model with 4-bit precision. Currently only works with LLaMA.')
-parser.add_argument('--gptq-bits', type=int, default=0, help='Load a pre-quantized model with specified precision. 2, 3, 4 and 8bit are supported. Currently only works with LLaMA.')
+parser.add_argument('--gptq-bits', type=int, default=0, help='Load a pre-quantized model with specified precision. 2, 3, 4 and 8bit are supported. Currently only works with LLaMA and OPT.')
+parser.add_argument('--gptq-model-type', type=str, default='llama', help='Model type of pre-quantized model. Currently only LLaMa and OPT are supported.')


Maybe we could infer --gptq-model-type from the model name?

@oobabooga
Then user will have to always set a special name for model folder with model type prefix (like llama-xxx, opt-xxxxx)

Maybe do as you described, but keep this argument as fallback if folder name has no prefix? What do you think about it?

That sounds like a good idea (making this argument optional).

I implemented this. Tested myself with different folder names and it works. (but I recommend you to check too)

So here is the logic:
If --gptq-model-type isn't present, then try to get model type from name. Print error and exit if failed
If --gptq-model-type is set, then don't try to get model name and use its value

LoopControl · 2023-03-13T22:23:59Z

Could you briefly describe how to convert an OPT huggingface model to .pt (or provide a link to pregenerated .pt)?

Would it be similar to this command documented in the GPTQ-llama repo:
python llama.py decapoda-research/llama-7b-hf c4 --wbits 4 --save llama7b-4bit.pt

Edit: Looks like the command is:
python opt.py KoboldAI/OPT-13B-Erebus c4 --wbits 4 --save opt-13b-4bit.pt

Does the dataset parameter (the "c4" in above) make a difference in inference? If so which would you recommend?

LoopControl · 2023-03-14T03:22:16Z

Good news is, after quantizing a 13B Erebus pt, the model loads in around 8GB of VRAM and seems to generate text.

Problem is, I'm seeing 5x+ slower generations with very short contexts in 4 bit mode as compared to 13B Llama in 4bit:

> python server.py --gptq-bits 4 --gptq-model-type opt --no-stream --model KoboldAI_OPT-13B-Erebus
Output generated in 49.05 seconds (1.06 tokens/s, 52 tokens)
Output generated in 95.38 seconds (0.84 tokens/s, 80 tokens)

For comparison, llama 13B:

> python server.py --gptq-bits 4 --no-stream --model llama-13b-hf
Output generated in 9.24 seconds (8.66 tokens/s, 80 tokens)
Output generated in 8.68 seconds (9.22 tokens/s, 80 tokens)

# 30B llama model, 200 token generation:
Output generated in 36.11 seconds (5.54 tokens/s, 200 tokens)
Output generated in 42.36 seconds (4.72 tokens/s, 200 tokens)

With larger contexts (800+ tokens), llama model continues to work fine but the OPT model seems to just hang (I gave it multiple minutes before killing process).

oobabooga · 2023-03-14T03:34:18Z

@LoopControl can you report that on https://github.com/qwopqwop200/GPTQ-for-LLaMa?

Pinging @qwopqwop200

qwopqwop200 · 2023-03-14T03:51:55Z

Benchmarked on opt2.7b, but not as slow as this.

Squashed commit of the following: commit 6a1787a Author: oobabooga <[email protected]> Date: Wed Mar 15 16:55:40 2023 -0300 CSS fixes commit 3047ed8 Author: oobabooga <[email protected]> Date: Wed Mar 15 16:41:38 2023 -0300 CSS fix commit 87b84d2 Author: oobabooga <[email protected]> Date: Wed Mar 15 16:39:59 2023 -0300 CSS fix commit c1959c2 Author: oobabooga <[email protected]> Date: Wed Mar 15 16:34:31 2023 -0300 Show/hide the extensions block using javascript commit 348596f Author: oobabooga <[email protected]> Date: Wed Mar 15 15:11:16 2023 -0300 Fix broken extensions commit c5f14fb Author: oobabooga <[email protected]> Date: Wed Mar 15 14:19:28 2023 -0300 Optimize the HTML generation speed commit bf812c4 Author: oobabooga <[email protected]> Date: Wed Mar 15 14:05:35 2023 -0300 Minor fix commit 658849d Author: oobabooga <[email protected]> Date: Wed Mar 15 13:29:00 2023 -0300 Move a checkbutton commit 05ee323 Author: oobabooga <[email protected]> Date: Wed Mar 15 13:26:32 2023 -0300 Rename a file commit 40c9e46 Author: oobabooga <[email protected]> Date: Wed Mar 15 13:25:28 2023 -0300 Add file commit d30a140 Author: oobabooga <[email protected]> Date: Wed Mar 15 13:24:54 2023 -0300 Further reorganize the UI commit ffc6cb3 Merge: cf2da86 3b62bd1 Author: oobabooga <[email protected]> Date: Wed Mar 15 12:56:21 2023 -0300 Merge pull request oobabooga#325 from Ph0rk0z/fix-RWKV-Names Fix rwkv names commit cf2da86 Author: oobabooga <[email protected]> Date: Wed Mar 15 12:51:13 2023 -0300 Prevent *Is typing* from disappearing instantly while streaming commit 4146ac4 Merge: 1413931 29b7c5a Author: oobabooga <[email protected]> Date: Wed Mar 15 12:47:41 2023 -0300 Merge pull request oobabooga#266 from HideLord/main Adding markdown support and slight refactoring. commit 29b7c5a Author: oobabooga <[email protected]> Date: Wed Mar 15 12:40:03 2023 -0300 Sort the requirements commit ec972b8 Author: oobabooga <[email protected]> Date: Wed Mar 15 12:33:26 2023 -0300 Move all css/js into separate files commit 693b53d Merge: 63c5a13 1413931 Author: oobabooga <[email protected]> Date: Wed Mar 15 12:08:56 2023 -0300 Merge branch 'main' into HideLord-main commit 1413931 Author: oobabooga <[email protected]> Date: Wed Mar 15 12:01:32 2023 -0300 Add a header bar and redesign the interface (oobabooga#293) commit 9d6a625 Author: oobabooga <[email protected]> Date: Wed Mar 15 11:04:30 2023 -0300 Add 'hallucinations' filter oobabooga#326 This breaks the API since a new parameter has been added. It should be a one-line fix. See api-example.py. commit 3b62bd1 Author: Forkoz <[email protected]> Date: Tue Mar 14 21:23:39 2023 +0000 Remove PTH extension from RWKV When loading the current model was blank unless you typed it out. commit f0f325e Author: Forkoz <[email protected]> Date: Tue Mar 14 21:21:47 2023 +0000 Remove Json from loading no more 20b tokenizer commit 128d18e Author: oobabooga <[email protected]> Date: Tue Mar 14 17:57:25 2023 -0300 Update README.md commit 1236c7f Author: oobabooga <[email protected]> Date: Tue Mar 14 17:56:15 2023 -0300 Update README.md commit b419dff Author: oobabooga <[email protected]> Date: Tue Mar 14 17:55:35 2023 -0300 Update README.md commit 72d207c Author: oobabooga <[email protected]> Date: Tue Mar 14 16:31:27 2023 -0300 Remove the chat API It is not implemented, has not been tested, and this is causing confusion. commit afc5339 Author: oobabooga <[email protected]> Date: Tue Mar 14 16:04:17 2023 -0300 Remove "eval" statements from text generation functions commit 5c05223 Merge: b327554 87192e2 Author: oobabooga <[email protected]> Date: Tue Mar 14 08:05:24 2023 -0300 Merge pull request oobabooga#295 from Zerogoki00/opt4-bit Add support for quantized OPT models commit 87192e2 Author: oobabooga <[email protected]> Date: Tue Mar 14 08:02:21 2023 -0300 Update README commit 265ba38 Author: oobabooga <[email protected]> Date: Tue Mar 14 07:56:31 2023 -0300 Rename a file, add deprecation warning for --load-in-4bit commit 3da73e4 Merge: 518e5c4 b327554 Author: oobabooga <[email protected]> Date: Tue Mar 14 07:50:36 2023 -0300 Merge branch 'main' into Zerogoki00-opt4-bit commit b327554 Author: oobabooga <[email protected]> Date: Tue Mar 14 00:18:13 2023 -0300 Update bug_report_template.yml commit 33b9a15 Author: oobabooga <[email protected]> Date: Mon Mar 13 23:03:16 2023 -0300 Delete config.yml commit b5e0d3c Author: oobabooga <[email protected]> Date: Mon Mar 13 23:02:25 2023 -0300 Create config.yml commit 7f301fd Merge: d685332 02d4075 Author: oobabooga <[email protected]> Date: Mon Mar 13 22:41:21 2023 -0300 Merge pull request oobabooga#305 from oobabooga/dependabot/pip/accelerate-0.17.1 Bump accelerate from 0.17.0 to 0.17.1 commit 02d4075 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Mar 14 01:40:42 2023 +0000 Bump accelerate from 0.17.0 to 0.17.1 Bumps [accelerate](https://github.com/huggingface/accelerate) from 0.17.0 to 0.17.1. - [Release notes](https://github.com/huggingface/accelerate/releases) - [Commits](huggingface/accelerate@v0.17.0...v0.17.1) --- updated-dependencies: - dependency-name: accelerate dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> commit d685332 Merge: 481ef3c df83088 Author: oobabooga <[email protected]> Date: Mon Mar 13 22:39:59 2023 -0300 Merge pull request oobabooga#307 from oobabooga/dependabot/pip/bitsandbytes-0.37.1 Bump bitsandbytes from 0.37.0 to 0.37.1 commit 481ef3c Merge: a0ef82c 715c3ec Author: oobabooga <[email protected]> Date: Mon Mar 13 22:39:22 2023 -0300 Merge pull request oobabooga#304 from oobabooga/dependabot/pip/rwkv-0.4.2 Bump rwkv from 0.3.1 to 0.4.2 commit df83088 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Mar 14 01:36:18 2023 +0000 Bump bitsandbytes from 0.37.0 to 0.37.1 Bumps [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) from 0.37.0 to 0.37.1. - [Release notes](https://github.com/TimDettmers/bitsandbytes/releases) - [Changelog](https://github.com/TimDettmers/bitsandbytes/blob/main/CHANGELOG.md) - [Commits](https://github.com/TimDettmers/bitsandbytes/commits) --- updated-dependencies: - dependency-name: bitsandbytes dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> commit 715c3ec Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Mar 14 01:36:02 2023 +0000 Bump rwkv from 0.3.1 to 0.4.2 Bumps [rwkv](https://github.com/BlinkDL/ChatRWKV) from 0.3.1 to 0.4.2. - [Release notes](https://github.com/BlinkDL/ChatRWKV/releases) - [Commits](https://github.com/BlinkDL/ChatRWKV/commits) --- updated-dependencies: - dependency-name: rwkv dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> commit a0ef82c Author: oobabooga <[email protected]> Date: Mon Mar 13 22:35:28 2023 -0300 Activate dependabot commit 3fb8196 Author: oobabooga <[email protected]> Date: Mon Mar 13 22:28:00 2023 -0300 Implement "*Is recording a voice message...*" for TTS oobabooga#303 commit 0dab2c5 Author: oobabooga <[email protected]> Date: Mon Mar 13 22:18:03 2023 -0300 Update feature_request.md commit 79e519c Author: oobabooga <[email protected]> Date: Mon Mar 13 20:03:08 2023 -0300 Update stale.yml commit 1571458 Author: oobabooga <[email protected]> Date: Mon Mar 13 19:39:21 2023 -0300 Update stale.yml commit bad0b0a Author: oobabooga <[email protected]> Date: Mon Mar 13 19:20:18 2023 -0300 Update stale.yml commit c805843 Author: oobabooga <[email protected]> Date: Mon Mar 13 19:09:06 2023 -0300 Update stale.yml commit 60cc7d3 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:53:11 2023 -0300 Update stale.yml commit 7c17613 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:47:31 2023 -0300 Update and rename .github/workflow/stale.yml to .github/workflows/stale.yml commit 47c941c Author: oobabooga <[email protected]> Date: Mon Mar 13 18:37:35 2023 -0300 Create stale.yml commit 511b136 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:29:38 2023 -0300 Update bug_report_template.yml commit d6763a6 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:27:24 2023 -0300 Update feature_request.md commit c6ecb35 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:26:28 2023 -0300 Update feature_request.md commit 6846427 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:19:07 2023 -0300 Update feature_request.md commit bcfb7d7 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:16:18 2023 -0300 Update bug_report_template.yml commit ed30bd3 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:14:54 2023 -0300 Update bug_report_template.yml commit aee3b53 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:14:31 2023 -0300 Update bug_report_template.yml commit 7dbc071 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:09:58 2023 -0300 Delete bug_report.md commit 69d4b81 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:09:37 2023 -0300 Create bug_report_template.yml commit 0a75584 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:07:08 2023 -0300 Create issue templates commit 518e5c4 Author: oobabooga <[email protected]> Date: Mon Mar 13 16:45:08 2023 -0300 Some minor fixes to the GPTQ loader commit 8778b75 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 22:11:40 2023 +0300 use updated load_quantized commit a6a6522 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 22:11:32 2023 +0300 determine model type from model name commit b6c5c57 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 22:11:08 2023 +0300 remove default value from argument commit 63c5a13 Merge: 683556f 7ab45fb Author: Alexander Hristov Hristov <[email protected]> Date: Mon Mar 13 19:50:08 2023 +0200 Merge branch 'main' into main commit e1c952c Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 20:22:38 2023 +0300 make argument non case-sensitive commit b746250 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 20:18:56 2023 +0300 Update README commit 3c9afd5 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 20:14:40 2023 +0300 rename method commit 1b99ed6 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 20:01:34 2023 +0300 add argument --gptq-model-type and remove duplicate arguments commit edbc611 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 20:00:38 2023 +0300 use new quant loader commit 345b6de Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 19:59:57 2023 +0300 refactor quant models loader and add support of OPT commit 683556f Author: HideLord <[email protected]> Date: Sun Mar 12 21:34:09 2023 +0200 Adding markdown support and slight refactoring.

commit 0cbe2dd Author: oobabooga <[email protected]> Date: Sat Mar 18 12:24:54 2023 -0300 Update README.md commit 36ac7be Merge: d2a7fac 705f513 Author: oobabooga <[email protected]> Date: Sat Mar 18 11:57:10 2023 -0300 Merge pull request oobabooga#407 from ThisIsPIRI/gitignore Add loras to .gitignore commit d2a7fac Author: oobabooga <[email protected]> Date: Sat Mar 18 11:56:04 2023 -0300 Use pip instead of conda for pytorch commit 705f513 Author: ThisIsPIRI <[email protected]> Date: Sat Mar 18 23:33:24 2023 +0900 Add loras to .gitignore commit a0b1a30 Author: oobabooga <[email protected]> Date: Sat Mar 18 11:23:56 2023 -0300 Specify torchvision/torchaudio versions commit c753261 Author: oobabooga <[email protected]> Date: Sat Mar 18 10:55:57 2023 -0300 Disable stop_at_newline by default commit 7c945cf Author: oobabooga <[email protected]> Date: Sat Mar 18 10:55:24 2023 -0300 Don't include PeftModel every time commit 86b9900 Author: oobabooga <[email protected]> Date: Sat Mar 18 10:27:52 2023 -0300 Remove rwkv dependency commit a163807 Author: oobabooga <[email protected]> Date: Sat Mar 18 03:07:27 2023 -0300 Update README.md commit a7acfa4 Author: oobabooga <[email protected]> Date: Fri Mar 17 22:57:46 2023 -0300 Update README.md commit bcd8afd Merge: dc35861 e26763a Author: oobabooga <[email protected]> Date: Fri Mar 17 22:57:28 2023 -0300 Merge pull request oobabooga#393 from WojtekKowaluk/mps_support Fix for MPS support on Apple Silicon commit e26763a Author: oobabooga <[email protected]> Date: Fri Mar 17 22:56:46 2023 -0300 Minor changes commit 7994b58 Author: Wojtek Kowaluk <[email protected]> Date: Sat Mar 18 02:27:26 2023 +0100 clean up duplicated code commit dc35861 Author: oobabooga <[email protected]> Date: Fri Mar 17 21:05:17 2023 -0300 Update README.md commit 30939e2 Author: Wojtek Kowaluk <[email protected]> Date: Sat Mar 18 00:56:23 2023 +0100 add mps support on apple silicon commit 7d97da1 Author: Wojtek Kowaluk <[email protected]> Date: Sat Mar 18 00:17:05 2023 +0100 add venv paths to gitignore commit f2a5ca7 Author: oobabooga <[email protected]> Date: Fri Mar 17 20:50:27 2023 -0300 Update README.md commit 8c8286b Author: oobabooga <[email protected]> Date: Fri Mar 17 20:49:40 2023 -0300 Update README.md commit 0c05e65 Author: oobabooga <[email protected]> Date: Fri Mar 17 20:25:42 2023 -0300 Update README.md commit adc2003 Merge: 20f5b45 66e8d12 Author: oobabooga <[email protected]> Date: Fri Mar 17 20:19:33 2023 -0300 Merge branch 'main' of github.com:oobabooga/text-generation-webui commit 20f5b45 Author: oobabooga <[email protected]> Date: Fri Mar 17 20:19:04 2023 -0300 Add parameters reference oobabooga#386 oobabooga#331 commit 66e8d12 Author: oobabooga <[email protected]> Date: Fri Mar 17 19:59:37 2023 -0300 Update README.md commit 9a87111 Author: oobabooga <[email protected]> Date: Fri Mar 17 19:52:22 2023 -0300 Update README.md commit d4f38b6 Author: oobabooga <[email protected]> Date: Fri Mar 17 18:57:48 2023 -0300 Update README.md commit ad7c829 Author: oobabooga <[email protected]> Date: Fri Mar 17 18:55:01 2023 -0300 Update README.md commit 4426f94 Author: oobabooga <[email protected]> Date: Fri Mar 17 18:51:07 2023 -0300 Update the installation instructions. Tldr use WSL commit 9256e93 Author: oobabooga <[email protected]> Date: Fri Mar 17 17:45:28 2023 -0300 Add some LoRA params commit 9ed2c45 Author: oobabooga <[email protected]> Date: Fri Mar 17 16:06:11 2023 -0300 Use markdown in the "HTML" tab commit f0b2645 Author: oobabooga <[email protected]> Date: Fri Mar 17 13:07:17 2023 -0300 Add a comment commit 7da742e Merge: ebef4a5 02e1113 Author: oobabooga <[email protected]> Date: Fri Mar 17 12:37:23 2023 -0300 Merge pull request oobabooga#207 from EliasVincent/stt-extension Extension: Whisper Speech-To-Text Input commit ebef4a5 Author: oobabooga <[email protected]> Date: Fri Mar 17 11:58:45 2023 -0300 Update README commit cdfa787 Author: oobabooga <[email protected]> Date: Fri Mar 17 11:53:28 2023 -0300 Update README commit 3bda907 Merge: 4c13067 614dad0 Author: oobabooga <[email protected]> Date: Fri Mar 17 11:48:48 2023 -0300 Merge pull request oobabooga#366 from oobabooga/lora Add LoRA support commit 614dad0 Author: oobabooga <[email protected]> Date: Fri Mar 17 11:43:11 2023 -0300 Remove unused import commit a717fd7 Author: oobabooga <[email protected]> Date: Fri Mar 17 11:42:25 2023 -0300 Sort the imports commit 7d97287 Author: oobabooga <[email protected]> Date: Fri Mar 17 11:41:12 2023 -0300 Update settings-template.json commit 29fe7b1 Author: oobabooga <[email protected]> Date: Fri Mar 17 11:39:48 2023 -0300 Remove LoRA tab, move it into the Parameters menu commit 214dc68 Author: oobabooga <[email protected]> Date: Fri Mar 17 11:24:52 2023 -0300 Several QoL changes related to LoRA commit 4c13067 Merge: ee164d1 53b6a66 Author: oobabooga <[email protected]> Date: Fri Mar 17 09:47:57 2023 -0300 Merge pull request oobabooga#377 from askmyteapot/Fix-Multi-gpu-GPTQ-Llama-no-tokens Update GPTQ_Loader.py commit 53b6a66 Author: askmyteapot <[email protected]> Date: Fri Mar 17 18:34:13 2023 +1000 Update GPTQ_Loader.py Correcting decoder layer for renamed class. commit 0cecfc6 Author: oobabooga <[email protected]> Date: Thu Mar 16 21:35:53 2023 -0300 Add files commit 104293f Author: oobabooga <[email protected]> Date: Thu Mar 16 21:31:39 2023 -0300 Add LoRA support commit ee164d1 Author: oobabooga <[email protected]> Date: Thu Mar 16 18:22:16 2023 -0300 Don't split the layers in 8-bit mode by default commit 0a2aa79 Merge: dd1c596 e085cb4 Author: oobabooga <[email protected]> Date: Thu Mar 16 17:27:03 2023 -0300 Merge pull request oobabooga#358 from mayaeary/8bit-offload Add support for memory maps with --load-in-8bit commit e085cb4 Author: oobabooga <[email protected]> Date: Thu Mar 16 13:34:23 2023 -0300 Small changes commit dd1c596 Author: oobabooga <[email protected]> Date: Thu Mar 16 12:45:27 2023 -0300 Update README commit 38d7017 Author: oobabooga <[email protected]> Date: Thu Mar 16 12:44:03 2023 -0300 Add all command-line flags to "Interface mode" commit 83cb20a Author: awoo <awoo@awoo> Date: Thu Mar 16 18:42:53 2023 +0300 Add support for --gpu-memory witn --load-in-8bit commit 23a5e88 Author: oobabooga <[email protected]> Date: Thu Mar 16 11:16:17 2023 -0300 The LLaMA PR has been merged into transformers huggingface/transformers#21955 The tokenizer class has been changed from "LLaMATokenizer" to "LlamaTokenizer" It is necessary to edit this change in every tokenizer_config.json that you had for LLaMA so far. commit d54f3f4 Author: oobabooga <[email protected]> Date: Thu Mar 16 10:19:00 2023 -0300 Add no-stream checkbox to the interface commit 1c37896 Author: oobabooga <[email protected]> Date: Thu Mar 16 10:18:34 2023 -0300 Remove unused imports commit a577fb1 Author: oobabooga <[email protected]> Date: Thu Mar 16 00:46:59 2023 -0300 Keep GALACTICA special tokens (oobabooga#300) commit 25a00ea Author: oobabooga <[email protected]> Date: Wed Mar 15 23:43:35 2023 -0300 Add "Experimental" warning commit 599d313 Author: oobabooga <[email protected]> Date: Wed Mar 15 23:34:08 2023 -0300 Increase the reload timeout a bit commit 4d64a57 Author: oobabooga <[email protected]> Date: Wed Mar 15 23:29:56 2023 -0300 Add Interface mode tab commit b501722 Merge: ffb8986 d3a280e Author: oobabooga <[email protected]> Date: Wed Mar 15 20:46:04 2023 -0300 Merge branch 'main' of github.com:oobabooga/text-generation-webui commit ffb8986 Author: oobabooga <[email protected]> Date: Wed Mar 15 20:44:34 2023 -0300 Mini refactor commit d3a280e Merge: 445ebf0 0552ab2 Author: oobabooga <[email protected]> Date: Wed Mar 15 20:22:08 2023 -0300 Merge pull request oobabooga#348 from mayaeary/feature/koboldai-api-share flask_cloudflared for shared tunnels commit 445ebf0 Author: oobabooga <[email protected]> Date: Wed Mar 15 20:06:46 2023 -0300 Update README.md commit 0552ab2 Author: awoo <awoo@awoo> Date: Thu Mar 16 02:00:16 2023 +0300 flask_cloudflared for shared tunnels commit e9e76bb Author: oobabooga <[email protected]> Date: Wed Mar 15 19:42:29 2023 -0300 Delete WSL.md commit 09045e4 Author: oobabooga <[email protected]> Date: Wed Mar 15 19:42:06 2023 -0300 Add WSL guide commit 9ff5033 Merge: 66256ac 055edc7 Author: oobabooga <[email protected]> Date: Wed Mar 15 19:37:26 2023 -0300 Merge pull request oobabooga#345 from jfryton/main Guide for Windows Subsystem for Linux commit 66256ac Author: oobabooga <[email protected]> Date: Wed Mar 15 19:31:27 2023 -0300 Make the "no GPU has been detected" message more descriptive commit 055edc7 Author: jfryton <[email protected]> Date: Wed Mar 15 18:21:14 2023 -0400 Update WSL.md commit 89883a3 Author: jfryton <[email protected]> Date: Wed Mar 15 18:20:21 2023 -0400 Create WSL.md guide for setting up WSL Ubuntu Quick start guide for Windows Subsystem for Linux (Ubuntu), including port forwarding to enable local network webui access. commit 67d6247 Author: oobabooga <[email protected]> Date: Wed Mar 15 18:56:26 2023 -0300 Further reorganize chat UI commit ab12a17 Merge: 6a1787a 3028112 Author: oobabooga <[email protected]> Date: Wed Mar 15 18:31:39 2023 -0300 Merge pull request oobabooga#342 from mayaeary/koboldai-api Extension: KoboldAI api commit 3028112 Author: awoo <awoo@awoo> Date: Wed Mar 15 23:52:46 2023 +0300 KoboldAI api commit 6a1787a Author: oobabooga <[email protected]> Date: Wed Mar 15 16:55:40 2023 -0300 CSS fixes commit 3047ed8 Author: oobabooga <[email protected]> Date: Wed Mar 15 16:41:38 2023 -0300 CSS fix commit 87b84d2 Author: oobabooga <[email protected]> Date: Wed Mar 15 16:39:59 2023 -0300 CSS fix commit c1959c2 Author: oobabooga <[email protected]> Date: Wed Mar 15 16:34:31 2023 -0300 Show/hide the extensions block using javascript commit 348596f Author: oobabooga <[email protected]> Date: Wed Mar 15 15:11:16 2023 -0300 Fix broken extensions commit c5f14fb Author: oobabooga <[email protected]> Date: Wed Mar 15 14:19:28 2023 -0300 Optimize the HTML generation speed commit bf812c4 Author: oobabooga <[email protected]> Date: Wed Mar 15 14:05:35 2023 -0300 Minor fix commit 658849d Author: oobabooga <[email protected]> Date: Wed Mar 15 13:29:00 2023 -0300 Move a checkbutton commit 05ee323 Author: oobabooga <[email protected]> Date: Wed Mar 15 13:26:32 2023 -0300 Rename a file commit 40c9e46 Author: oobabooga <[email protected]> Date: Wed Mar 15 13:25:28 2023 -0300 Add file commit d30a140 Author: oobabooga <[email protected]> Date: Wed Mar 15 13:24:54 2023 -0300 Further reorganize the UI commit ffc6cb3 Merge: cf2da86 3b62bd1 Author: oobabooga <[email protected]> Date: Wed Mar 15 12:56:21 2023 -0300 Merge pull request oobabooga#325 from Ph0rk0z/fix-RWKV-Names Fix rwkv names commit cf2da86 Author: oobabooga <[email protected]> Date: Wed Mar 15 12:51:13 2023 -0300 Prevent *Is typing* from disappearing instantly while streaming commit 4146ac4 Merge: 1413931 29b7c5a Author: oobabooga <[email protected]> Date: Wed Mar 15 12:47:41 2023 -0300 Merge pull request oobabooga#266 from HideLord/main Adding markdown support and slight refactoring. commit 29b7c5a Author: oobabooga <[email protected]> Date: Wed Mar 15 12:40:03 2023 -0300 Sort the requirements commit ec972b8 Author: oobabooga <[email protected]> Date: Wed Mar 15 12:33:26 2023 -0300 Move all css/js into separate files commit 693b53d Merge: 63c5a13 1413931 Author: oobabooga <[email protected]> Date: Wed Mar 15 12:08:56 2023 -0300 Merge branch 'main' into HideLord-main commit 1413931 Author: oobabooga <[email protected]> Date: Wed Mar 15 12:01:32 2023 -0300 Add a header bar and redesign the interface (oobabooga#293) commit 9d6a625 Author: oobabooga <[email protected]> Date: Wed Mar 15 11:04:30 2023 -0300 Add 'hallucinations' filter oobabooga#326 This breaks the API since a new parameter has been added. It should be a one-line fix. See api-example.py. commit 3b62bd1 Author: Forkoz <[email protected]> Date: Tue Mar 14 21:23:39 2023 +0000 Remove PTH extension from RWKV When loading the current model was blank unless you typed it out. commit f0f325e Author: Forkoz <[email protected]> Date: Tue Mar 14 21:21:47 2023 +0000 Remove Json from loading no more 20b tokenizer commit 128d18e Author: oobabooga <[email protected]> Date: Tue Mar 14 17:57:25 2023 -0300 Update README.md commit 1236c7f Author: oobabooga <[email protected]> Date: Tue Mar 14 17:56:15 2023 -0300 Update README.md commit b419dff Author: oobabooga <[email protected]> Date: Tue Mar 14 17:55:35 2023 -0300 Update README.md commit 72d207c Author: oobabooga <[email protected]> Date: Tue Mar 14 16:31:27 2023 -0300 Remove the chat API It is not implemented, has not been tested, and this is causing confusion. commit afc5339 Author: oobabooga <[email protected]> Date: Tue Mar 14 16:04:17 2023 -0300 Remove "eval" statements from text generation functions commit 5c05223 Merge: b327554 87192e2 Author: oobabooga <[email protected]> Date: Tue Mar 14 08:05:24 2023 -0300 Merge pull request oobabooga#295 from Zerogoki00/opt4-bit Add support for quantized OPT models commit 87192e2 Author: oobabooga <[email protected]> Date: Tue Mar 14 08:02:21 2023 -0300 Update README commit 265ba38 Author: oobabooga <[email protected]> Date: Tue Mar 14 07:56:31 2023 -0300 Rename a file, add deprecation warning for --load-in-4bit commit 3da73e4 Merge: 518e5c4 b327554 Author: oobabooga <[email protected]> Date: Tue Mar 14 07:50:36 2023 -0300 Merge branch 'main' into Zerogoki00-opt4-bit commit b327554 Author: oobabooga <[email protected]> Date: Tue Mar 14 00:18:13 2023 -0300 Update bug_report_template.yml commit 33b9a15 Author: oobabooga <[email protected]> Date: Mon Mar 13 23:03:16 2023 -0300 Delete config.yml commit b5e0d3c Author: oobabooga <[email protected]> Date: Mon Mar 13 23:02:25 2023 -0300 Create config.yml commit 7f301fd Merge: d685332 02d4075 Author: oobabooga <[email protected]> Date: Mon Mar 13 22:41:21 2023 -0300 Merge pull request oobabooga#305 from oobabooga/dependabot/pip/accelerate-0.17.1 Bump accelerate from 0.17.0 to 0.17.1 commit 02d4075 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Mar 14 01:40:42 2023 +0000 Bump accelerate from 0.17.0 to 0.17.1 Bumps [accelerate](https://github.com/huggingface/accelerate) from 0.17.0 to 0.17.1. - [Release notes](https://github.com/huggingface/accelerate/releases) - [Commits](huggingface/accelerate@v0.17.0...v0.17.1) --- updated-dependencies: - dependency-name: accelerate dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> commit d685332 Merge: 481ef3c df83088 Author: oobabooga <[email protected]> Date: Mon Mar 13 22:39:59 2023 -0300 Merge pull request oobabooga#307 from oobabooga/dependabot/pip/bitsandbytes-0.37.1 Bump bitsandbytes from 0.37.0 to 0.37.1 commit 481ef3c Merge: a0ef82c 715c3ec Author: oobabooga <[email protected]> Date: Mon Mar 13 22:39:22 2023 -0300 Merge pull request oobabooga#304 from oobabooga/dependabot/pip/rwkv-0.4.2 Bump rwkv from 0.3.1 to 0.4.2 commit df83088 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Mar 14 01:36:18 2023 +0000 Bump bitsandbytes from 0.37.0 to 0.37.1 Bumps [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) from 0.37.0 to 0.37.1. - [Release notes](https://github.com/TimDettmers/bitsandbytes/releases) - [Changelog](https://github.com/TimDettmers/bitsandbytes/blob/main/CHANGELOG.md) - [Commits](https://github.com/TimDettmers/bitsandbytes/commits) --- updated-dependencies: - dependency-name: bitsandbytes dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> commit 715c3ec Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Mar 14 01:36:02 2023 +0000 Bump rwkv from 0.3.1 to 0.4.2 Bumps [rwkv](https://github.com/BlinkDL/ChatRWKV) from 0.3.1 to 0.4.2. - [Release notes](https://github.com/BlinkDL/ChatRWKV/releases) - [Commits](https://github.com/BlinkDL/ChatRWKV/commits) --- updated-dependencies: - dependency-name: rwkv dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> commit a0ef82c Author: oobabooga <[email protected]> Date: Mon Mar 13 22:35:28 2023 -0300 Activate dependabot commit 3fb8196 Author: oobabooga <[email protected]> Date: Mon Mar 13 22:28:00 2023 -0300 Implement "*Is recording a voice message...*" for TTS oobabooga#303 commit 0dab2c5 Author: oobabooga <[email protected]> Date: Mon Mar 13 22:18:03 2023 -0300 Update feature_request.md commit 79e519c Author: oobabooga <[email protected]> Date: Mon Mar 13 20:03:08 2023 -0300 Update stale.yml commit 1571458 Author: oobabooga <[email protected]> Date: Mon Mar 13 19:39:21 2023 -0300 Update stale.yml commit bad0b0a Author: oobabooga <[email protected]> Date: Mon Mar 13 19:20:18 2023 -0300 Update stale.yml commit c805843 Author: oobabooga <[email protected]> Date: Mon Mar 13 19:09:06 2023 -0300 Update stale.yml commit 60cc7d3 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:53:11 2023 -0300 Update stale.yml commit 7c17613 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:47:31 2023 -0300 Update and rename .github/workflow/stale.yml to .github/workflows/stale.yml commit 47c941c Author: oobabooga <[email protected]> Date: Mon Mar 13 18:37:35 2023 -0300 Create stale.yml commit 511b136 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:29:38 2023 -0300 Update bug_report_template.yml commit d6763a6 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:27:24 2023 -0300 Update feature_request.md commit c6ecb35 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:26:28 2023 -0300 Update feature_request.md commit 6846427 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:19:07 2023 -0300 Update feature_request.md commit bcfb7d7 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:16:18 2023 -0300 Update bug_report_template.yml commit ed30bd3 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:14:54 2023 -0300 Update bug_report_template.yml commit aee3b53 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:14:31 2023 -0300 Update bug_report_template.yml commit 7dbc071 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:09:58 2023 -0300 Delete bug_report.md commit 69d4b81 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:09:37 2023 -0300 Create bug_report_template.yml commit 0a75584 Author: oobabooga <[email protected]> Date: Mon Mar 13 18:07:08 2023 -0300 Create issue templates commit 02e1113 Author: EliasVincent <[email protected]> Date: Mon Mar 13 21:41:19 2023 +0100 add auto-transcribe option commit 518e5c4 Author: oobabooga <[email protected]> Date: Mon Mar 13 16:45:08 2023 -0300 Some minor fixes to the GPTQ loader commit 8778b75 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 22:11:40 2023 +0300 use updated load_quantized commit a6a6522 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 22:11:32 2023 +0300 determine model type from model name commit b6c5c57 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 22:11:08 2023 +0300 remove default value from argument commit 63c5a13 Merge: 683556f 7ab45fb Author: Alexander Hristov Hristov <[email protected]> Date: Mon Mar 13 19:50:08 2023 +0200 Merge branch 'main' into main commit e1c952c Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 20:22:38 2023 +0300 make argument non case-sensitive commit b746250 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 20:18:56 2023 +0300 Update README commit 3c9afd5 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 20:14:40 2023 +0300 rename method commit 1b99ed6 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 20:01:34 2023 +0300 add argument --gptq-model-type and remove duplicate arguments commit edbc611 Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 20:00:38 2023 +0300 use new quant loader commit 345b6de Author: Ayanami Rei <[email protected]> Date: Mon Mar 13 19:59:57 2023 +0300 refactor quant models loader and add support of OPT commit 48aa528 Author: EliasVincent <[email protected]> Date: Sun Mar 12 21:03:07 2023 +0100 use Gradio microphone input instead commit 683556f Author: HideLord <[email protected]> Date: Sun Mar 12 21:34:09 2023 +0200 Adding markdown support and slight refactoring. commit 3b41459 Merge: 1c0bda3 3375eae Author: Elias Vincent Simon <[email protected]> Date: Sun Mar 12 19:19:43 2023 +0100 Merge branch 'oobabooga:main' into stt-extension commit 1c0bda3 Author: EliasVincent <[email protected]> Date: Fri Mar 10 11:47:16 2023 +0100 added installation instructions commit a24fa78 Author: EliasVincent <[email protected]> Date: Thu Mar 9 21:18:46 2023 +0100 tweaked Whisper parameters commit d5efc06 Merge: 00359ba 3341447 Author: Elias Vincent Simon <[email protected]> Date: Thu Mar 9 21:05:34 2023 +0100 Merge branch 'oobabooga:main' into stt-extension commit 00359ba Author: EliasVincent <[email protected]> Date: Thu Mar 9 21:03:49 2023 +0100 interactive preview window commit 7a03d0b Author: EliasVincent <[email protected]> Date: Thu Mar 9 20:33:00 2023 +0100 cleanup commit 4c72e43 Author: EliasVincent <[email protected]> Date: Thu Mar 9 12:46:50 2023 +0100 first implementation

Add support for quantized OPT models

Zerogoki00 added 4 commits March 13, 2023 19:59

refactor quant models loader and add support of OPT

345b6de

use new quant loader

edbc611

add argument --gptq-model-type and remove duplicate arguments

1b99ed6

rename method

3c9afd5

Zerogoki00 changed the title ~~Add supporting for quantized OTP models and refactor~~ Add supporting for quantized OPT models and refactor Mar 13, 2023

Update README

b746250

Zerogoki00 force-pushed the opt4-bit branch from b4df5cd to b746250 Compare March 13, 2023 17:20

make argument non case-sensitive

e1c952c

Zerogoki00 changed the title ~~Add supporting for quantized OPT models and refactor~~ Add support for quantized OPT models and refactor Mar 13, 2023

oobabooga reviewed Mar 13, 2023

View reviewed changes

Zerogoki00 and others added 4 commits March 13, 2023 22:11

remove default value from argument

b6c5c57

determine model type from model name

a6a6522

use updated load_quantized

8778b75

Some minor fixes to the GPTQ loader

518e5c4

oobabooga added 3 commits March 14, 2023 07:50

Merge branch 'main' into Zerogoki00-opt4-bit

3da73e4

Rename a file, add deprecation warning for --load-in-4bit

265ba38

Update README

87192e2

oobabooga merged commit 5c05223 into oobabooga:main Mar 14, 2023

Ph0rk0z pushed a commit to Ph0rk0z/text-generation-webui-testing that referenced this pull request Apr 17, 2023

Merge pull request oobabooga#295 from Zerogoki00/opt4-bit

2ea88db

Add support for quantized OPT models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for quantized OPT models and refactor #295

Add support for quantized OPT models and refactor #295

Zerogoki00 commented Mar 13, 2023 •

edited

Loading

Ph0rk0z commented Mar 13, 2023

oobabooga Mar 13, 2023

Zerogoki00 Mar 13, 2023

oobabooga Mar 13, 2023

Zerogoki00 Mar 13, 2023

Zerogoki00 Mar 13, 2023

oobabooga Mar 13, 2023

oobabooga Mar 13, 2023 •

edited

Loading

Zerogoki00 Mar 13, 2023

oobabooga Mar 13, 2023

Zerogoki00 Mar 13, 2023

Zerogoki00 Mar 13, 2023

LoopControl commented Mar 13, 2023 •

edited

Loading

LoopControl commented Mar 14, 2023 •

edited

Loading

oobabooga commented Mar 14, 2023

qwopqwop200 commented Mar 14, 2023

Add support for quantized OPT models and refactor #295

Add support for quantized OPT models and refactor #295

Conversation

Zerogoki00 commented Mar 13, 2023 • edited Loading

Ph0rk0z commented Mar 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oobabooga Mar 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LoopControl commented Mar 13, 2023 • edited Loading

LoopControl commented Mar 14, 2023 • edited Loading

oobabooga commented Mar 14, 2023

qwopqwop200 commented Mar 14, 2023

Zerogoki00 commented Mar 13, 2023 •

edited

Loading

oobabooga Mar 13, 2023 •

edited

Loading

LoopControl commented Mar 13, 2023 •

edited

Loading

LoopControl commented Mar 14, 2023 •

edited

Loading