-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would it be possible to compress all messages? #224
Comments
Not exactly sure what you're asking here. Pin bottom adjusts how many recent messages are excluded from the summary. Pin top does not exclude the messages from the summary, but also does not remove those pinned messages future requests.
If you want messages to be summarized, yes, you need to include it. The "Marvin" profile should have one example, though getting summarization to work the way you want consistently can be a bit elusive. OpenAI seems to use a fair amount of embedding cache, so sometimes you don't see the changes in the results to your summary prompt right away. If you don't want to use summary prompt, you can use FIFO... then old messages just roll off.
I think you'll have limited success with this. While chatgpt can sort of "compress" and "decompress" in this way, you'll notice the actual token reduction is minimal, the compression is lossy, and the content from the "decompression" isn't used in the context very well -- at least that's what I found in some minimal testing. Asking it to summarize can get similar or much greater results as far as token use goes with much more simple use of the summary. But if you can show some samples of how this would work in a chat, let me know. |
Oh, I didn't expect so detailed message, thank you, I appreciate it. (I'm not that technical, so excuse my miss of knowledge)
I'd try to make hallucinations a little less rare, so I thought If the actual response from me, and the response back would be always compressed, the 4000 token would be enough for a longer time. Example I have quit good results with this:
This would go into the System, and the reminder summary (the banner) would be invisible. So I was thinking about some hidden prompt before every sent and received message where I could add this, or maybe the compression prompt. It would be invisible in the conversation, but it would be added to every message of mine, and the API. For continuous compression I could send the hidden message before every message:
And I could add a hidden message to the top of every received message: So in short I'd like to have on option to add hidden message to every prompt I send or receive to try various options for continuous conversation with less chance of hallucination. I know it wouldn't be perfect, but I'd happy to have an option to try it. |
There's the hidden prompt prefix option, though in the current release it's a new user messages injected before your last message. Se if your hidden prompt prefix is:
and you then prompt:
the messages sent in the API request will be:
user:
This test version allows a little more flexibility in the hidden prompt prefix: Let's say you're going to prompt Your hidden prompt prefix could be:
The last message in the API request would then be:
But what you'd see in your chat, and what would be sent in subsequent requests at part of the context, would be:
You can also use "::EOM::" to send a hidden message thread. For example:
This would generate a hidden message thread like:
assistant:
user:
(hidden prefix threads can help "train" responses for the LLM. I plan on adding something similar for system prompts.) The problem is, I still don't see the compression/decompression thing working the way you'd hope. |
Maybe I should try an example too: So what I feel can be discussion in the background - sorry I can't find the words, this is what is happening:
This is what you see:
EDIT: ok, I misunderstood everything. Just put both into a token counter and they are almost the same. Sorry, forget it 😊 |
Yeah, and add in the token overhead of telling it to compress/decompress and you'll have lost any possible gain. Using previously cached embeddings and recalling them based on the vector of the current user prompt is probably the way forward for your example case (adventure game, etc.) For example with "What is that buzzing sound?" it could look through all previous messages that have any relation and toss them in the request somewhere. Where things get complicated is, how many other related messages do you include in the request, where do you inject them, how many recent message vectors do you use when pulling context (just the last prompt? The last plus a few back?) and so on. I may have a go at it at some point, but not sure yet. Here's a rough idea how embeddings can be used: |
Also, using function calls to allow ChatGPT to store things like items you've found, etc., in variables that could be injected into the request in, say, the system prompt or hidden prompt prefixes, etc., could open up some interesting possibilities... another thing I've been exploring, but not sure what I'll do with it. |
Is this something similar how teach a 50 page document to it? I really love to do that (100k characters but with Unicode) |
In a way. You could have your 50 page document cached in a local vector DB, then based on how you decide to pull context, portions of that document could be included in the request. However, it wouldn't be near the same as real training data, fine-tuning, etc., but still would allow ChatGPT access to more data than it's been trained with. |
So this is "training"? Is that a complex part, like can it be a feature request to train this with a PDF? |
I'm not sure what to call it. I see some people call it "training", but it's no more "training" than a prompt is "training". It's really just dynamically building a prompt by including external data in prompts based on relations between vectors of other data in the prompt... I can't think of a good name for it off the top of my head. |
So you are saying that I could do it already with prompts? Or it would be a new feature from you? |
Would need to be a new feature... no idea what to call it though. Maybe "external memory". |
It's not memory ChatGPT would have at the tip of its fingers -- only bits of it that are recalled based on very current context would be available. |
(I should add that it also costs per token to get OpenAI embeddings for the content. $0.0001 per 1000 tokens I think.) |
So if you implement this, then I should pay for the communication (like now) and the "library search" to find what I need? Sounds fair enough if that is the case, no? |
Yeah. Let's say I added a feature to "vectorize" (request embeddings and add to vector DB) large text documents (not sure I'd want to do PDF or anything but text out the gate, since there's added complexity and bloat involved in parsing those documents) You'd be charged by OpenAI for each token in that document. Then, for a chat to be able to use that document, every message in that chat, including system and assistant prompts, would need to be vectorized as well, costing per token, so relevant data could be queried from both any documents linked with the chat and previous messages in the chat and then included in the next request. (Each chat message would only need to be vectorized once.) |
Text would be great, but that sounds expensive - can't think of real numbers, but still - so how some sites, like ChatPDF site does it? Without us paying the API price? |
They could use some of the publicly available models instead of ChatGPT / OpenAI. I've not looked closely at them. But even if using Open AI, and we say IDK, 500 words per page and a 128 page PDF?: 128 x 500 x 1.3 = 83,200 tokens maybe? 83,200 * 0.0000001 = $0.00832 (is my math way off?) So, almost a penny per 500 page pdf if I calculated that right. |
Hm, I don't know, did you use embedding model price? https://openai.com/pricing |
$0.0001 per 1000 tokens, so $0.0000001 per token, right? |
Yes, it should be, but it sounds too cheap |
Yeah, but I think it's right. Looks like ChatPDF works just like proposed, and uses OpenAI: How does ChatPDF work?In the analyzing step, ChatPDF creates a semantic index over all paragraphs of the PDF. When answering a question, ChatPDF finds the most relevant paragraphs from the PDF and uses the ChatGPT API from OpenAI to generate an answer. Does ChatPDF use GPT-4?ChatPDF uses GPT 3.5 for now, which is the same as ChatGPT. We are looking at how to add GPT-4. But GPT-4 won't be available for all messages on the free plan because it costs too much. |
Oh, I see, so it is quite possible to make something similar using simple text but everyone's custom API? |
Should be possible, though I'm not sure what you mean by "everyone's custom API". There are a number of things to figure out before/during implementation though:
|
Create a new feature request and let @Niek give his input. Maybe you can use ChatGPT to make sense of all the blarble I spit out above ;) |
Not sure about that, I can hardly understand the half of what you said - I'm doing "Explain Like I'm 5" searches constantly 😄 |
I have found two interesting sites: https://yodayo.com/ and https://github.com/TavernAI/TavernAI |
Haven't looked closely, but most will use embeddings and a vector DB like I described. It's about the only way to have a longer term memory right now. |
I've read how summarizing previous messages work, and would it be possible to toggle it to all messages?
Also there is a "Summary Generation Prompt" - is it necessary to fill it or is it optional? And the one here is the default which it uses when left empty? #29
Compress the text in a way that fits our conversation, and such that you (GPT) can reconstruct it as close as possible to the original. This is for yourself. Do not make it human readable. Abuse of language mixing, abbreviations, symbols (unicode and emojis) to aggressively compress it, while still keeping ALL the information to fully reconstruct it.
The text was updated successfully, but these errors were encountered: