-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4096 token limit #29
Comments
There are some approaches to work around the token limit:
Not sure what is the best option for now. |
We can add a button for continuous conversation, which, when turned off, will not send the context and only send the current conversation separately. |
How does chatPDF handle tens of thousands of words of content at once? |
I don't know what chatPDF is but I thought I would chime in. If you mean (OpenAI Chat), it seems to allow continuous conversation in a single chat, but when you hit the token limit it forgets and continues on as if you had not spoken, sometimes if you write a long prompt, it will only respond to half the prompt, as you can hit the token limit in a single prompt. This creates the illusion of a continuous conversation but comes at the expense of being unclear where token limit ended and began.
A button for continuous conversation that can be toggled would be interesting. Or a button that toggles continuous chat and/or prompt to toggle which "Summarizes the conversation in a new API call, use this as input for future messages" would be a powerful combination. This is currently what I do manually. When I hit the 4096 token limit I take the main points and summarize it, then go to new chat and paste the previous chat's summarized content into the new chat, followed by my next prompt. This is very cumbersome. A way to automate this process while staying inside a single chat would be nice - without having to make multiple new chats when token limit is reached. To just start over, something that can be toggled on and off. |
@chaoyuyan as already explained above, it doesn't. Instead it uses embeddings and respond to your questions in two steps. First it gathers the information from the PDF that is related to your question, then it sends these shorter sections along with your question to the ChatGPT API. This doesn't work perfectly, but it's a very useful trick. It's not up to me, but imlementing something like this at the moment seems way out of scope. It makes more sense to build something like this as a separate tool or look at other tools that can be integrated into chatgpt-web. Like jerryjliu/llama_index for example. I propose two simpler alternative solutions:
|
There is a interesting compression approach described here: |
I just tried this compression approach today, GPT 4 is able to condense with unicode, GPT 3.5 is able to condense, but not in the same way. GPT 3.5 can read whatever unicode shorthand GPT 4 condenses. But there have been unfortunate side effects and I am not sure if it is because in his prompt he referred to a text, for a tweet, but my prompts weren't the same. Prompt: Compress the text in a way that fits our conversation, and such that you (GPT) can reconstruct it as close as possible to the original. This is for yourself. Do not make it human readable. Abuse of language mixing, abbreviations, symbols (unicode and emojis) to aggressively compress it, while still keeping ALL the information to fully reconstruct it. "I want you to act as a personal writing assistant. As my writing assistant, you'll be here to help me brainstorm ideas, outline me writing, create synopses, provide feedback on my drafts, and assist with any other writing-related tasks I may need help with. Together, we can work to improve my writing skills and help me achieve my writing goals." Response: IWU2actPWA.Ideabr:storm,OutL📝,CreaSynops,🔙draft,Assist📝Tasks.2gthr:⬆️skills&✅goals. New Chat: I asked you compress a long text using your own abbreviations. You replied with: IWU2actPWA.Ideabr:storm,OutL📝,CreaSynops,🔙draft,Assist📝Tasks.2gthr:⬆️skills&✅goals. Reconstruct the original text. Response GPT 3.5: Prompt: Try again. Response GPT 4: That was great! As it retained details, but it changed enough that I wondered how it would do with a larger text with multiple paragraphs. I went and took a larger text and asked it to to condense it. It condensed it, but did not reconstruct as it was, it reconstructed a condensed version of the text as it understood the text. In another version, In a scene from a novel it did the same thing, but misinterpreted quite a bit of details. I also tried to get it to condense an entire chat for reconstruction, and it did so, but did not retain details of the texts. Gpt 4 (requesting to summarize chat) (original text) https://www.outsideonline.com/2126281/stop-buying-small-portable-power-generators/ Prompt: Reconstruct the original text. The Estream costs around $250, weighs 1.8 pounds (including the built-in battery) and can store up to 6,400mAh in four and a half hours. By contrast, the Outdoor Tech Kodiak: costs $50, weighs 8.96 ounces, has a 6,000mAh battery, and can charge an iPhone 7 roughly three times. It outperforms - and is less costly - and weighs less than some other portable chargers. In spite of this, I can see value in this technique, but getting it to reconstruct the initial prompt is crucial. Maybe with some prompt tweaking and experimentation it could be improved. Right now it butchered the meaning of a long text I was working so badly that I don't think I would use it for anything where attention to detail is important. But I really love that this is a thing, what a clever hack! Edit: I just realized something that has the potential to aid in prompt crafting and engineering. This technique has the ability to identify how GPT interprets prompts and to refine prompts so they achieve the desired goal. What a wonderful discovery, thank you Niek! |
One thing to note is that the "compression" and "decompression" is a lot more consistent if you set the temperature to 0, meaning more deterministic and less random output. In any case, definitely interesting to see how we can use this technique to compress previous messages. |
Bypassing OpenAI token restrictions using LangChain |
#152 should take care of the truncate and summarization portion of this request. I didn't venture into the shorthand/compression bit for a few reasons: Event on GPT 4, it really didn't seem to perform much better than summarization, yet far more complicated. The "compression" wasn't exactly lossless, much like summarization, yet the strange shorthand and emojis still used a fair amount of tokens. Didn't work well at all on GPT 3.5. Wasn't as easily scalable as summarization for longer conversations. I guess I'm not exactly sure how you all see the compression thing playing out in regards to keeping chat sessions going on indefinitely, but would like to know more about the implementation you envision. |
now we meet 4096 token limit, maybe we can drop the oldest message when its get the limit?
The text was updated successfully, but these errors were encountered: