Releases: oobabooga/text-generation-webui
v1.9.1
v1.9
Backend updates
- 4-bit and 8-bit kv cache options have been added to llama.cpp and llamacpp_HF. They reuse the existing
--cache_8bit
and--cache_4bit
flags. Thanks @GodEmperor785 for figuring out what values to pass to llama-cpp-python. - Transformers:
- Add eager attention option to make Gemma-2 work correctly (#6188). Thanks @GralchemOz.
- Automatically detect bfloat16/float16 precision when loading models in 16-bit precision.
- Automatically apply eager attention to models with
Gemma2ForCausalLM
architecture. - Gemma-2 support: Automatically detect and apply the optimal settings for this model with the two changes above. No need to set
--bf16 --use_eager_attention
manually.
- Automatically obtain the EOT token from Jinja2 templates and add it to the stopping strings, fixing Llama-3-Instruct not stopping. No need to add
<eot>
to the custom stopping strings anymore.
UI updates
- Whisper STT overhaul: this extension has been rewritten, replacing the Gradio microphone component with a custom microphone element that is much more reliable (#6194). Thanks @RandomInternetPreson, @TimStrauven, and @mamei16.
- Make the character dropdown menu coexist in the "Chat" tab and the "Parameters > Character" tab, after some people pointed out that moving it entirely to the Chat tab makes it harder to edit characters.
- Colors in the light theme have been improved, making it a bit more aesthetic.
- Increase the chat area on mobile devices.
Bug fixes
- Fix the API request to AUTOMATIC1111 in the sd-api-pictures extension.
- Fix a glitch when switching tabs with "Show controls" unchecked in the chat tab and extensions loaded.
Library updates
- llama-cpp-python: bump to 0.2.81 (adds Gemma-2 support).
- Transformers: bump to 4.42 (adds Gemma-2 support).
Support
- GitHub Sponsors: https://github.com/sponsors/oobabooga
- ko-fi: https://ko-fi.com/oobabooga
v1.8
Releases with version numbers are back! The last one was v1.7 in October 8th, 2023, so I am calling this one v1.8.
From this release on, it will be possible to install past releases by downloading the .zip
source and running the start_
script in it. The installation script no longer updates to the latest version automatically. This doesn't apply to snapshots/releases before this one.
New backend
- Add TensorRT-LLM support.
- That's now the fastest backend in the project.
- It currently has to be installed in a separate Python 3.10 environment.
- A Dockerfile is provided.
- For instructions on how to convert models, consult #5715 and https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md.
UI updates
- Improved "past chats" menu: this menu is now a vertical list of text items instead of a dropdown menu, making it a lot easier to switch between past conversations. Only one click is required instead of two.
- Store the chat history in the browser: if you restart the server and do not refresh the browser, your conversation will not be accidentally erased anymore.
- Avoid some unnecessary calls to the backend, making the UI faster and more responsive.
- Move the "Character" droprown menu to the main Chat tab, to make it faster to switch between different characters.
- Change limits of RoPE scaling sliders in UI (#6142). Thanks @GodEmperor785.
- Do not expose "alpha_value" for llama.cpp and "rope_freq_base" for transformers to keep things simple and avoid conversions.
- Remove an obsolete info message intended for GPTQ-for-LLaMa.
- Remove the "Tab" shortcut to switch between the generation tabs and the "Parameter" tabs, as it was awkward.
- Improved streaming of lists, which would flicker and temporarily display horizontal lines sometimes.
Bug fixes
- Revert the reentrant generation lock to a simple lock, fixing an issue caused by the change.
- Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119). Thanks @Ph0rk0z.
- Fix "500 error" issue caused by
block_requests.py
(#5976). Thanks @nero-dv. - Setting default alpha_value and fixing loading some newer DeepSeekCoder GGUFs (#6111). Thanks @mefich.
Library updates
- llama-cpp-python: bump to 0.2.79 (after a month of wrestling with GitHub Actions).
- ExLlamaV2: bump to 0.1.6.
- flash-attention: bump to 2.5.9.post1.
- PyTorch: bump to 2.2.2. That's the last 2.2 patch version.
- HQQ: bump to 0.1.7.post3. Makes HQQ functional again.
Other updates
- Do not "git pull" during installation, allowing previous releases (from this one on) to be installed.
- Make logs more readable, no more \u7f16\u7801 (#6127). Thanks @Touch-Night.
Support this project
- Become a GitHub Sponsor ❤️
- Buy me a ko-fi ☕
snapshot-2024-04-28
What's Changed
- Bumped ExLlamaV2 to version 0.0.19 to resolve #5851 by @ashleykleynhans in #5880
- Bump llama-cpp-python to 0.2.64, use official wheels by @oobabooga in #5921
- nvidia docker: make sure gradio listens on 0.0.0.0 by @jvanmelckebeke in #5918
- Revert walrus operator for params['max_memory'] by @Column01 in #5878
- Merge dev branch by @oobabooga in #5927
New Contributors
- @jvanmelckebeke made their first contribution in #5918
- @Column01 made their first contribution in #5878
Full Changelog: snapshot-2024-04-21...snapshot-2024-04-28
snapshot-2024-04-21
What's Changed
- Fix whisper STT by @mamei16 in #5856
- [Hotfix] Revert sse-starlette version bump because it breaks API request cancellation by @p-e-w in #5873
- Add a /v1/internal/chat-prompt endpoint by @oobabooga in #5879
- Merge dev branch by @oobabooga in #5887
New Contributors
Full Changelog: snapshot-2024-04-14...snapshot-2024-04-21
snapshot-2024-04-14
What's Changed
- Add a simple min_p preset, make it the default by @oobabooga in #5836
- Respect model and lora directory settings when downloading files by @acon96 in #5842
- FIX Issue #5783 Transparency to image cache by @Victorivus in #5827
- Update gradio requirement from ==4.25.* to ==4.26.* by @dependabot in #5832
- Fix saving of UI defaults to settings.yaml - Fixes #5592 by @ashleykleynhans in #5794
- Take HF_ENDPOINT in consideration by @zaypen in #5571
- Add Ascend NPU support by @wangshuai09 in #5541
- Bump sse-starlette from 1.6.5 to 2.1.0 by @dependabot in #5831
- Merge dev branch by @oobabooga in #5848
New Contributors
- @acon96 made their first contribution in #5842
- @Victorivus made their first contribution in #5827
- @wangshuai09 made their first contribution in #5541
Full Changelog: snapshot-2024-04-07...snapshot-2024-04-14
snapshot-2024-04-07
What's Changed
- Remove CTransformers support by @oobabooga in #5807
- Merge dev branch by @oobabooga in #5810
- Bump aqlm[cpu,gpu] from 1.1.2 to 1.1.3 by @dependabot in #5790
- Merge dev branch by @oobabooga in #5822
- requirements: add psutil by @cebtenzzre in #5819
- Merge dev branch by @oobabooga in #5823
Full Changelog: snapshot-2024-03-31...snapshot-2024-04-07
snapshot-2024-03-31
What's Changed
- Bump gradio to 4.23 by @oobabooga in #5758
- Fix prompt incorrectly set to empty when suffix is empty string by @Yiximail in #5757
- Set an default empty string for
user_bio
to fix #5717 issue. by @Yiximail in #5722 - docker: Remove misleading CLI_ARGS by @wldhx in #5726
- Add config for hyperion and hercules models to use chatml by @bartowski1182 in #5742
- Bump aqlm[cpu,gpu] from 1.1.0 to 1.1.2 by @dependabot in #5728
- Organize the parameters tab by @oobabooga in #5767
- Merge dev branch by @oobabooga in #5772
New Contributors
Full Changelog: snapshot-2024-03-24...snapshot-2024-03-31
snapshot-2024-03-24
Full Changelog: snapshot-2024-03-17...snapshot-2024-03-24
snapshot-2024-03-17
What's Changed
- Make superbooga & superboogav2 functional again by @oobabooga in #5656
- Add AQLM support (experimental) by @oobabooga in #5466
- Bump AutoAWQ to 0.2.3 (Linux only) by @oobabooga in #5658
- Add StreamingLLM for llamacpp & llamacpp_HF (2nd attempt) by @oobabooga in #5669
- Merge dev branch by @oobabooga in #5680
- UI: Add a new "User description" field for user personality/biography by @oobabooga in #5691
- Merge dev branch by @oobabooga in #5716
Full Changelog: snapshot-2024-03-10...snapshot-2024-03-17