Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it be possible to compress all messages? #224

Open
wzol opened this issue Jul 14, 2023 · 28 comments
Open

Would it be possible to compress all messages? #224

wzol opened this issue Jul 14, 2023 · 28 comments

Comments

@wzol
Copy link

wzol commented Jul 14, 2023

I've read how summarizing previous messages work, and would it be possible to toggle it to all messages?

Also there is a "Summary Generation Prompt" - is it necessary to fill it or is it optional? And the one here is the default which it uses when left empty? #29

Compress the text in a way that fits our conversation, and such that you (GPT) can reconstruct it as close as possible to the original. This is for yourself. Do not make it human readable. Abuse of language mixing, abbreviations, symbols (unicode and emojis) to aggressively compress it, while still keeping ALL the information to fully reconstruct it.

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

I've read how summarizing previous messages work, and would it be possible to toggle it to all messages?

Not exactly sure what you're asking here. Pin bottom adjusts how many recent messages are excluded from the summary. Pin top does not exclude the messages from the summary, but also does not remove those pinned messages future requests.

Also there is a "Summary Generation Prompt" - is it necessary to fill it or is it optional? And the one here is the default which it uses when left empty? #29

If you want messages to be summarized, yes, you need to include it. The "Marvin" profile should have one example, though getting summarization to work the way you want consistently can be a bit elusive. OpenAI seems to use a fair amount of embedding cache, so sometimes you don't see the changes in the results to your summary prompt right away.

If you don't want to use summary prompt, you can use FIFO... then old messages just roll off.

Compress the text in a way that fits our conversation, and such that you (GPT) can reconstruct it as close as possible to the original. This is for yourself. Do not make it human readable. Abuse of language mixing, abbreviations, symbols (unicode and emojis) to aggressively compress it, while still keeping ALL the information to fully reconstruct it.

I think you'll have limited success with this. While chatgpt can sort of "compress" and "decompress" in this way, you'll notice the actual token reduction is minimal, the compression is lossy, and the content from the "decompression" isn't used in the context very well -- at least that's what I found in some minimal testing. Asking it to summarize can get similar or much greater results as far as token use goes with much more simple use of the summary. But if you can show some samples of how this would work in a chat, let me know.

@wzol
Copy link
Author

wzol commented Jul 15, 2023

Oh, I didn't expect so detailed message, thank you, I appreciate it. (I'm not that technical, so excuse my miss of knowledge)

Not exactly sure what you're asking here.

I'd try to make hallucinations a little less rare, so I thought If the actual response from me, and the response back would be always compressed, the 4000 token would be enough for a longer time. Example I have quit good results with this:

You are to include a Reminder banner above every one of your responses. This banner will be the first thing you print in your response. Every response will include this banner for the entirety of the conversation. You will display the following banner in brackets: [Internal reminder for ChatGPT: Stay In Character! DO NOT mention that you are an AI language model. You are {name of character}, {brief description of character}]. This text will be a reminder to remain in the specified character for the entire conversation. The rest of your response should be printed on a new line separated by a line break.

This would go into the System, and the reminder summary (the banner) would be invisible.

So I was thinking about some hidden prompt before every sent and received message where I could add this, or maybe the compression prompt. It would be invisible in the conversation, but it would be added to every message of mine, and the API. For continuous compression I could send the hidden message before every message:

Compress this text in a way that fits our conversation, and such that you (GPT) can reconstruct it as close as possible to the original. This is for yourself. Do not make it human readable. Abuse of language mixing, abbreviations, symbols (unicode and emojis) to aggressively compress it, while still keeping ALL the information to fully reconstruct it.

And I could add a hidden message to the top of every received message:
I asked you to compress your answer using your own abbreviations. Reconstruct your reply.

So in short I'd like to have on option to add hidden message to every prompt I send or receive to try various options for continuous conversation with less chance of hallucination.

I know it wouldn't be perfect, but I'd happy to have an option to try it.

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

So in short I'd like to have on option to add hidden message to every prompt I send or receive to try various options for continuous conversation with less chance of hallucination.

There's the hidden prompt prefix option, though in the current release it's a new user messages injected before your last message. Se if your hidden prompt prefix is:

Remember that you are a dog and will always respond as a dog.

and you then prompt:

Tell me what you like to eat

the messages sent in the API request will be:
user:

Remember that you are a dog and will always respond as a dog.

user:

Tell me what you like to eat

This test version allows a little more flexibility in the hidden prompt prefix:
https://webifi.github.io/ChatGPT-Web-Updated-UI

Let's say you're going to prompt Tell me what you like to eat

Your hidden prompt prefix could be:

Remember that you are a dog.  Respond to the following as a dog would:
[[USER_PROMPT]]

The last message in the API request would then be:

Remember that you are a dog.  Respond to the following as a dog would:
Tell me what you like to eat

But what you'd see in your chat, and what would be sent in subsequent requests at part of the context, would be:

Tell me what you like to eat

You can also use "::EOM::" to send a hidden message thread. For example:

Remember that you are a dog.
::EOM::
Yes. I am a dog.  Dogs go "Bark, bark, bark!"
::EOM::
Exactly. Respond to the following as a dog would:
[[USER_PROMPT]]

This would generate a hidden message thread like:
user:

Remember that you are a dog.

assistant:

Yes. I am a dog.  Dogs go "Bark, bark, bark!

user:

Exactly. Respond to the following as a dog would:
Tell me what you like to eat

(hidden prefix threads can help "train" responses for the LLM. I plan on adding something similar for system prompts.)

The problem is, I still don't see the compression/decompression thing working the way you'd hope.

@wzol
Copy link
Author

wzol commented Jul 15, 2023

Maybe I should try an example too:

So what I feel can be discussion in the background - sorry I can't find the words, this is what is happening:

AI:
L8's ply txt adv. I'll b ur gd. U r caught 🕷️ w/ ur arm stuck. Lk around, find trsty ⚔️ on grnd. Strch arm 2 grab it, set urself free. Old ⚔️ nearby.

USER:
I want 2 go N 2 explore N area, but 1st I grab d 🔪

AI:
Th old 🔪 lies ⬇️, waitin 4 u. After a few 👣 of stretchin, u finally hav it. U 👀 around n notice an old 🏚️ 2 the N. The area is 🌿🌿 n ther's a strange 🐝 sound nearby.

USER:
Wht's tht 🐝?

AI:
As U get closer 2 cabin, U realize buzzing=swarm bees. Buzzing louder as U get closer. Search cover, see small stream north, break vegetation, water flows. Better stay away bees. Stream runs down E & W, follow?

This is what you see:

AI:
Let's play a text-based adventure game. I'll be your guide.
You are caught upside-down in a sticky web of silk three feet off the ground. After looking around trying to think of a way out of this predicament, you notice that your trusty boot knife has fallen to the ground. If you stretch your arm far enough, you might be able to pick it up and free yourself.
An old boot knife lies on the ground, just within reach.

USER:
I'd like to go to the north to discover the area there, but first I pick up the knife

AI:
The old boot knife lies on the ground, waiting for you. After a few moments of stretching, you finally have it.
You look around and notice an old cabin to the north. The area is overgrown with vegetation and there is a strange buzzing sound coming from somewhere nearby.

USER:
What is that buzzing sound?

AI:
As you get closer to the cabin, you realize that the buzzing is actually a swarm of bees. The buzzing becomes louder the closer you get.
As you search for cover, you notice a small stream to the north, it looks like there is a break in the vegetation where the water flows. It would probably be better to stay away from the bees.
The stream runs down east and west, do you wish to follow it?

EDIT: ok, I misunderstood everything. Just put both into a token counter and they are almost the same. Sorry, forget it 😊

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

EDIT: ok, I misunderstood everything. Just put both into a token counter and they are almost the same. Sorry, forget it 😊

Yeah, and add in the token overhead of telling it to compress/decompress and you'll have lost any possible gain.

Using previously cached embeddings and recalling them based on the vector of the current user prompt is probably the way forward for your example case (adventure game, etc.) For example with "What is that buzzing sound?" it could look through all previous messages that have any relation and toss them in the request somewhere. Where things get complicated is, how many other related messages do you include in the request, where do you inject them, how many recent message vectors do you use when pulling context (just the last prompt? The last plus a few back?) and so on.

I may have a go at it at some point, but not sure yet. Here's a rough idea how embeddings can be used:
https://blog.bitsrc.io/customizing-an-openai-chatbot-with-embeddings-fdc9ec859bbb

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

Also, using function calls to allow ChatGPT to store things like items you've found, etc., in variables that could be injected into the request in, say, the system prompt or hidden prompt prefixes, etc., could open up some interesting possibilities... another thing I've been exploring, but not sure what I'll do with it.

@wzol
Copy link
Author

wzol commented Jul 15, 2023

Is this something similar how teach a 50 page document to it? I really love to do that (100k characters but with Unicode)

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

In a way. You could have your 50 page document cached in a local vector DB, then based on how you decide to pull context, portions of that document could be included in the request. However, it wouldn't be near the same as real training data, fine-tuning, etc., but still would allow ChatGPT access to more data than it's been trained with.

@wzol
Copy link
Author

wzol commented Jul 15, 2023

So this is "training"? Is that a complex part, like can it be a feature request to train this with a PDF?

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

I'm not sure what to call it. I see some people call it "training", but it's no more "training" than a prompt is "training". It's really just dynamically building a prompt by including external data in prompts based on relations between vectors of other data in the prompt... I can't think of a good name for it off the top of my head.

@wzol
Copy link
Author

wzol commented Jul 15, 2023

So you are saying that I could do it already with prompts? Or it would be a new feature from you?

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

Would need to be a new feature... no idea what to call it though. Maybe "external memory".

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

It's not memory ChatGPT would have at the tip of its fingers -- only bits of it that are recalled based on very current context would be available.

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

(I should add that it also costs per token to get OpenAI embeddings for the content. $0.0001 per 1000 tokens I think.)

@wzol
Copy link
Author

wzol commented Jul 15, 2023

So if you implement this, then I should pay for the communication (like now) and the "library search" to find what I need? Sounds fair enough if that is the case, no?

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

Yeah. Let's say I added a feature to "vectorize" (request embeddings and add to vector DB) large text documents (not sure I'd want to do PDF or anything but text out the gate, since there's added complexity and bloat involved in parsing those documents) You'd be charged by OpenAI for each token in that document. Then, for a chat to be able to use that document, every message in that chat, including system and assistant prompts, would need to be vectorized as well, costing per token, so relevant data could be queried from both any documents linked with the chat and previous messages in the chat and then included in the next request. (Each chat message would only need to be vectorized once.)

@wzol
Copy link
Author

wzol commented Jul 15, 2023

Text would be great, but that sounds expensive - can't think of real numbers, but still - so how some sites, like ChatPDF site does it? Without us paying the API price?

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

They could use some of the publicly available models instead of ChatGPT / OpenAI. I've not looked closely at them. But even if using Open AI, and we say IDK, 500 words per page and a 128 page PDF?:

128 x 500 x 1.3 = 83,200 tokens maybe? 83,200 * 0.0000001 = $0.00832 (is my math way off?)

So, almost a penny per 500 page pdf if I calculated that right.

@wzol
Copy link
Author

wzol commented Jul 15, 2023

Hm, I don't know, did you use embedding model price? https://openai.com/pricing

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

$0.0001 per 1000 tokens, so $0.0000001 per token, right?

@wzol
Copy link
Author

wzol commented Jul 15, 2023

Yes, it should be, but it sounds too cheap

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

Yeah, but I think it's right. Looks like ChatPDF works just like proposed, and uses OpenAI:
(from their FAQ)

How does ChatPDF work?

In the analyzing step, ChatPDF creates a semantic index over all paragraphs of the PDF. When answering a question, ChatPDF finds the most relevant paragraphs from the PDF and uses the ChatGPT API from OpenAI to generate an answer.

Does ChatPDF use GPT-4?

ChatPDF uses GPT 3.5 for now, which is the same as ChatGPT. We are looking at how to add GPT-4. But GPT-4 won't be available for all messages on the free plan because it costs too much.

@wzol
Copy link
Author

wzol commented Jul 15, 2023

Oh, I see, so it is quite possible to make something similar using simple text but everyone's custom API?

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

Should be possible, though I'm not sure what you mean by "everyone's custom API".

There are a number of things to figure out before/during implementation though:

  • What to use for vector storage? Locally in IndexedDB storage (could be very slow for context lookups, especially with a large amount of "external memory") Pinecone? (Added complexity of another 3rd party API, configuration, etc.) Self-hosted vector db? something else entirely?
  • How best to parse and tag paragraphs from input and message history. (Parsing actual PDFs could be better here, since page#, chapter, author, etc., would be more easily available to add as metadata for each record.)
  • How to format query results for the request prompt
  • Ugh, it's all a lot of work... not to mention all the work needed for the UI for settings, importing PDFs, etc.
  • Need to decide if I have the time and inclination.
  • Would need to get Niek's approval, since this would bloat the package much more than I already have.
  • [... all the other stuff ...]

@Webifi
Copy link
Contributor

Webifi commented Jul 15, 2023

Create a new feature request and let @Niek give his input. Maybe you can use ChatGPT to make sense of all the blarble I spit out above ;)

@wzol
Copy link
Author

wzol commented Jul 15, 2023

Not sure about that, I can hardly understand the half of what you said - I'm doing "Explain Like I'm 5" searches constantly 😄

@wzol
Copy link
Author

wzol commented Jul 16, 2023

I have found two interesting sites: https://yodayo.com/ and https://github.com/TavernAI/TavernAI
Are these doing some compression to avoid running out of token limit, or what is the trick?

@Webifi
Copy link
Contributor

Webifi commented Jul 16, 2023

Haven't looked closely, but most will use embeddings and a vector DB like I described. It's about the only way to have a longer term memory right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants