-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming support? #345
Comments
This is definitely on the roadmap - it's a little tricky due to how we use structured outputs, but it's possible. |
If you take a look at (for example) LMstudio, there is a little snippet in there, that causes realtime text-streaming.
with in particular the part:
Maybe this is a good start to get this implemented in memgpt. I was already looking into it myself, but I can't seem to figure it out on my own I am afraid... |
I have also played about with the streaming text. Each llm servers have slightly different approaches to this function, but the for loop is key to each. I think the best way to figure it out for each server is to play about with a stand alone script first. Follow the relevant servers docs and then once confirmed, test it out with memgpt. |
Have a similar issue here with vLLM. For now my work around might just be wait for a full generation by Mem and then do a fake delay which iterates over the assistant_message output and streams that back to my client. |
This could be a roadmap so that text output should be streaming as the llm generates the message or thought. A use case I can think for this is would be the implementation of TTS with shorter response time (TTS would speak every sentence generated).
Though this would have to refractor a lot of MemGPT's code as the LLM would generally have to output a JSON but I think this could be solved by having each functions be done by agents. One handles the thought, one handles the message (Both could be using streaming output), and the other would be function calling (The one that doesn't necessarily need text streaming as an output.)
This could also make it easier for developers to make the GUI with the model showing the users the live outputting of the LLMs
The text was updated successfully, but these errors were encountered: