Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StreamingLLM for llamacpp & llamacpp_HF (2nd attempt) #5669

Merged
merged 4 commits into from
Mar 9, 2024

Conversation

oobabooga
Copy link
Owner

@oobabooga oobabooga commented Mar 9, 2024

Cleaned-up version of #4761. It seems to be working reliably for both llamacpp and llamacpp_HF now.

Description

When active, this prevents the prompt from being re-evaluated once an old chat message is removed, thus allowing you to talk to the model indefinitely.

Usage

--streaming-llm or check the box below before loading the model.

asd

@oobabooga oobabooga merged commit afb51bd into dev Mar 9, 2024
@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Mar 9, 2024

It doesn't peg CPU this time? Will have to try it.

@oobabooga oobabooga deleted the streamingllmv2 branch March 17, 2024 15:26
bartowski1182 pushed a commit to bartowski1182/text-generation-webui that referenced this pull request Mar 23, 2024
@RichardFevrier
Copy link

Naive question @oobabooga could it work with exllamav2?

PoetOnTheRun pushed a commit to PoetOnTheRun/text-generation-webui that referenced this pull request Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants