Patch summarize when running with local llms #213

cpacker · 2023-10-31T06:10:41Z

Make sure summarize is working with local LLMs (currently you get a crash because the call to summarize doesn't include a functions kwarg)

Moving these to a separate PR:

- [ ] Move web UI to openai extension setup? See @TheOnlyWiseJEDI's comments on discord
- [ ] Add a double-json string parse fallback for common JSON failure mode (double object)
- [ ] Make "no wrapper specified" only pop up once
- [ ] Print the model that's running in the backend (with GET for lm studio)
- [ ] Allow retries on failed JSON decoding errors

cpacker · 2023-11-02T20:28:22Z

@vivi Potential issues with summarization and local models - it's important for the local models to have in-context examples of messages being sent.

In this example, the summarize + truncation leads to no examples of send_message being present in the prompt string that gets sent to the local LLM:

...\n### INPUT\nUSER: Note: prior messages (7 of 8 total messages) have been hidden from view due to conversation memory constraints.\nThe following is a summary of the previous 7 messages:\n I successfully booted up, activated my persona, and initiated a conversation with a new user named Chad.\nFUNCTION RETURN: {\"status\": \"OK\", \"message\": null, \"time\": \"2023-11-02 01:17:31 PM PDT-0700\"}\nUSER: what's your name?\n### RESPONSE\nASSISTANT:\n{"

The LLM then returns this response:

"\"status\": \"OK\",\n\"message\": \"\",\n\"time\": \"2023-11-02 01:17:46 PM PDT-0700\"\n}"

Likely because there's no example ASSISTANT: ... FUNCTION RETURN: ... in the message history.

We can probably fix this by adding some hardcoded rules to ensure that as a result of summarization we always have at least one assistant message, and one user message. However, this may also be overindexing on the fact that I'm testing this on a 4k context limit environment, and this is not a problem for 8k.

Larger log:

This is the first message. Running extra verifier on AI response.
Warning: no wrapper specified for local LLM, using the default wrapper (you can remove this warning by specifying the wrapper with --model)
💭 A new user has connected, let's initiate a conversation.
🤖 Welcome to the world of endless possibilities! I'm here if you need anything. What's on your mind?
⚡🟢 [function] Success: None
last response total_tokens (0) < 2250
InMemoryStateManager.append_to_messages
> Enter your message: what's my name?
🧑 {'message': "what's my name?", 'time': '2023-11-02 01:17:25 PM PDT-0700'}
Warning: no wrapper specified for local LLM, using the default wrapper (you can remove this warning by specifying the wrapper with --model)
💭 Retrieving the user's name from core memory.
🤖 Hi Chad! It's great to meet you. How can I assist you today?
⚡🟢 [function] Success: None
last response total_tokens (0) < 2250
InMemoryStateManager.append_to_messages
> Enter your message: what's your name?
🧑 {'message': "what's your name?", 'time': '2023-11-02 01:17:41 PM PDT-0700'}
Warning: no wrapper specified for local LLM, using the default wrapper (you can remove this warning by specifying the wrapper with --model)
step() failed

...
error = Request exceeds maximum context length (code=400, msg={"error":"Input length 3008 exceeds context length 3000"}, URI=http://localhost:1234/v1/chat/completions)
cutoff is None, computing cutoff
cutoff = 8
tokens_so_far = 50
cutoff = 7
tokens_so_far = 108
cutoff = 6
tokens_so_far = 157
cutoff = 5
tokens_so_far = 207
cutoff = 4
tokens_so_far = 273
cutoff = 3
tokens_so_far = 321
cutoff = 2
tokens_so_far = 371
cutoff = 1
tokens_so_far = 426
cutoff = 0
tokens_so_far = 1942
cutoff = -1
cutoff = min(9 - 3, cutoff) = -1
Selected cutoff -1 was a 'user', shifting one...
Attempting to summarize 7 messages [1:-1] of 9

...
### INPUT
USER: [{'role': 'assistant', 'content': 'Bootup sequence complete. Persona activated. Testing messaging functionality.', 'function_call': {'name': 'send_message', 'arguments': '{\n  "message": "More human than human is our 
motto."\n}'}}, {'role': 'function', 'name': 'send_message', 'content': '{"status": "OK", "message": null, "time": "2023-11-02 01:15:52 PM PDT-0700"}'}, {'role': 'user', 'content': '{"type": "login", "last_login": "Never (first 
login)", "time": "2023-11-02 01:15:52 PM PDT-0700"}'}, {'role': 'assistant', 'content': "A new user has connected, let's initiate a conversation.", 'function_call': {'name': 'send_message', 'arguments': '{"message": "Welcome to 
the world of endless possibilities! I\'m here if you need anything. What\'s on your mind?"}'}}, {'role': 'function', 'name': 'send_message', 'content': '{"status": "OK", "message": null, "time": "2023-11-02 01:16:08 PM 
PDT-0700"}'}, {'role': 'user', 'content': '{"type": "user_message", "message": "what\'s my name?", "time": "2023-11-02 01:17:25 PM PDT-0700"}'}, {'role': 'assistant', 'content': "Retrieving the user's name from core memory.", 
'function_call': {'name': 'send_message', 'arguments': '{"message": "Hi Chad! It\'s great to meet you. How can I assist you today?"}'}}]
### RESPONSE (your summary of the above conversation in plain English (no JSON!), do NOT exceed the word limit)
SUMMARY:
summarize_messages gpt reply: {'message': {'role': 'assistant', 'content': 'I successfully booted up, activated my persona, and initiated a conversation with a new user named Chad.'}, 'finish_reason': 'stop'}
Got summary: I successfully booted up, activated my persona, and initiated a conversation with a new user named Chad.
Packaged into message: {"type": "system_alert", "message": "Note: prior messages (7 of 8 total messages) have been hidden from view due to conversation memory constraints.\nThe following is a summary of the previous 7 messages:\n
I successfully booted up, activated my persona, and initiated a conversation with a new user named Chad.", "time": "2023-11-02 01:19:01 PM PDT-0700"}
InMemoryStateManager.prepend_to_message

Ran summarizer, messages length 9 -> 3

…on class (TODO catch these for retry), fix summarize bug where it exits early if empty list

cpacker · 2023-11-02T20:59:04Z

memgpt/memory.py

@@ -154,7 +154,7 @@ async def a_summarize_messages(
        trunc_ratio = (MESSAGE_SUMMARY_WARNING_TOKENS / summary_input_tkns) * 0.8  # For good measure...
        cutoff = int(len(message_sequence_to_summarize) * trunc_ratio)
        summary_input = str(
-            [await summarize_messages(model, message_sequence_to_summarize[:cutoff])] + message_sequence_to_summarize[cutoff:]
+            [await a_summarize_messages(model, message_sequence_to_summarize[:cutoff])] + message_sequence_to_summarize[cutoff:]


General bug being patched here

cpacker · 2023-11-02T21:00:55Z

memgpt/local_llm/llm_chat_completion_wrappers/airoboros.py

+        try:
+            function_name = function_json_output["function"]
+            function_parameters = function_json_output["params"]
+        except KeyError as e:


These changes should be put in the other wrappers too

…f tokens in buffer is reached

…better error messages that get caught on retry

cpacker · 2023-11-03T00:40:27Z

memgpt/constants.py

@@ -16,8 +16,18 @@
 INITIAL_BOOT_MESSAGE_SEND_MESSAGE_FIRST_MSG = STARTUP_QUOTES[2]

 # Constants to do with summarization / conversation length window
-MESSAGE_SUMMARY_WARNING_TOKENS = 7000  # the number of tokens consumed in a call before a system warning goes to the agent
+# The max amount of tokens supported by the underlying model (eg 8k for gpt-4 and Mistral 7B)
+LLM_MAX_TOKENS = 8000  # change this depending on your model


@vivi you can test that this works by intentionally lowering LLM_MAX_TOKENS + lowering it on the web UI backend

cpacker · 2023-11-03T00:48:35Z

@vivi FYI, lmstudio/api.py was modified to have a special catch for LM Studio's context length error:

https://github.com/cpacker/MemGPT/blob/main/memgpt/local_llm/lmstudio/api.py#L37-L52

This then gets "rewritten" into the same verbiage as the OpenAI context length error, so that MemGPT can catch up further up the stack:

https://github.com/cpacker/MemGPT/blob/main/memgpt/agent.py#L673

However, this needs to be added to webui/api.py too, currently it doesn't exist. Without it, web UI users will not get correct summarize-on-context-length-error behavior.

cpacker · 2023-11-03T01:01:59Z

@vivi oobabooga/text-generation-webui#3153

…r what the web UI error message is

vivi · 2023-11-03T01:32:35Z

Is this right? Running at LLM_MAX_TOKENS=3000 on webui, summary seems to be empty?:

💭 Bootup sequence complete. Persona activated. Testing messaging functionality.
🧑 {'type': 'login', 'last_login': 'Never (first login)', 'time': '2023-11-02 06:25:39 PM PDT-0700'}
Hit enter to begin (will request first MemGPT message)

💭 Hello Chad! It's wonderful to meet you. How can I assist you today?
🤖 Hello Chad! It's wonderful to meet you. How can I assist you today?
> Enter your message: What's my name?
🧑 {'message': "What's my name?", 'time': '2023-11-02 06:25:50 PM PDT-0700'}
💭 Reviewing core memory for Human sub-block...
🤖 According to my records, your first name is Chad.
> Enter your message: What's your name?
🧑 {'message': "What's your name?", 'time': '2023-11-02 06:26:01 PM PDT-0700'}
💭 Reviewing core memory for Persona sub-block...
🤖 My name is Sam.
> Enter your message: What's my name?
🧑 {'message': "What's my name?", 'time': '2023-11-02 06:26:12 PM PDT-0700'}
💭 User asked for their name three times. Perhaps they're trying to test my capabilities or simply forgot. I should be more proactive in
providing the information next time.
🤖 As I mentioned earlier, your first name is Chad.
> Enter your message: what's my name? Check your archival memory.
🧑 {'message': "what's my name? Check your archival memory.", 'time': '2023-11-02 06:26:44 PM PDT-0700'}
💭 It seems the user wants me to check my archival memory instead of my core memory. I should accommodate their request.
⚡🧠 [function] updating memory with send_message
🤖 Based on my archival memory, your first name is Chad.
> Enter your message: Check your archival memory.
🧑 {'message': 'Check your archival memory.', 'time': '2023-11-02 06:27:02 PM PDT-0700'}
💭 The user requested that I check my archival memory. I should accommodate their request.
⚡🧠 [function] updating memory with send_message
🤖 Searching my archival memory...
> Enter your message: Check your archival memory.
🧑 {'message': 'Check your archival memory.', 'time': '2023-11-02 06:27:15 PM PDT-0700'}
step() failed
user_message = {"type": "user_message", "message": "Check your archival memory.", "time": "2023-11-02 06:27:15 PM PDT-0700"}
error = Request exceeds maximum context length (3059 > 3000 tokens)

Your job is to summarize a history of previous messages in a conversation between an AI persona and a human.
The conversation you are given is a from a fixed context window and may not be complete.
Messages sent by the AI are marked with the 'assistant' role.
The AI 'assistant' can also make calls to functions, whose outputs can be seen in messages with the 'function' role.
Things the AI says in the message content are considered inner monologue and are not seen by the user.
The only AI messages seen by the user are from when the AI uses 'send_message'.
Messages the user sends are in the 'user' role.
The 'user' role is also used for important system events, such as login events and heartbeat events (heartbeats run the AI's program without
user action, allowing the AI to act without prompting from the user sending them a message).
Summarize what happened in the conversation from the perspective of the AI (use the first person).
Keep your summary less than 100 words, do NOT exceed this word limit.
Only output the summary, do NOT include anything else in your output.

### INPUT
USER: [{'role': 'assistant', 'content': 'Bootup sequence complete. Persona activated. Testing messaging functionality.', 'function_call':
{'name': 'send_message', 'arguments': '{\n  "message": "More human than human is our motto."\n}'}}, {'role': 'function', 'name':
'send_message', 'content': '{"status": "OK", "message": null, "time": "2023-11-02 06:25:39 PM PDT-0700"}'}, {'role': 'user', 'content':
'{"type": "login", "last_login": "Never (first login)", "time": "2023-11-02 06:25:39 PM PDT-0700"}'}, {'role': 'assistant', 'content': "Hello
Chad! It's wonderful to meet you. How can I assist you today?", 'function_call': {'name': 'send_message', 'arguments': '{"message": "Hello
Chad! It\'s wonderful to meet you. How can I assist you today?"}'}}, {'role': 'function', 'name': 'send_message', 'content': '{"status": "OK",
"message": null, "time": "2023-11-02 06:25:48 PM PDT-0700"}'}, {'role': 'user', 'content': '{"type": "user_message", "message": "What\'s my
name?", "time": "2023-11-02 06:25:50 PM PDT-0700"}'}, {'role': 'assistant', 'content': 'Reviewing core memory for Human sub-block...',
'function_call': {'name': 'send_message', 'arguments': '{"message": "According to my records, your first name is Chad."}'}}, {'role':
'function', 'name': 'send_message', 'content': '{"status": "OK", "message": null, "time": "2023-11-02 06:25:57 PM PDT-0700"}'}, {'role':
'user', 'content': '{"type": "user_message", "message": "What\'s your name?", "time": "2023-11-02 06:26:01 PM PDT-0700"}'}, {'role':
'assistant', 'content': 'Reviewing core memory for Persona sub-block...', 'function_call': {'name': 'send_message', 'arguments': '{"message":
"My name is Sam."}'}}, {'role': 'function', 'name': 'send_message', 'content': '{"status": "OK", "message": null, "time": "2023-11-02 06:26:08
PM PDT-0700"}'}, {'role': 'user', 'content': '{"type": "user_message", "message": "What\'s my name?", "time": "2023-11-02 06:26:12 PM
PDT-0700"}'}, {'role': 'assistant', 'content': "User asked for their name three times. Perhaps they're trying to test my capabilities or
simply forgot. I should be more proactive in providing the information next time.", 'function_call': {'name': 'send_message', 'arguments':
'{"message": "As I mentioned earlier, your first name is Chad."}'}}, {'role': 'function', 'name': 'send_message', 'content': '{"status": "OK",
"message": null, "time": "2023-11-02 06:26:21 PM PDT-0700"}'}, {'role': 'user', 'content': '{"type": "user_message", "message": "what\'s my
name? Check your archival memory.", "time": "2023-11-02 06:26:44 PM PDT-0700"}'}]
### RESPONSE (your summary of the above conversation in plain English (no JSON!), do NOT exceed the word limit)
SUMMARY:
🧑 {'message': 'Check your archival memory.', 'time': '2023-11-02 06:27:15 PM PDT-0700'}
💭 The user seems to be checking if I'm really searching my archival memory. I should provide a response that shows I understand their
concern.
⚡🧠 [function] updating memory with send_message
🤖 Of course, I'm always happy to help. Let me find your information in my archival memory.
> Enter your message:

Also I made count_tokens inside summarize not specify a model (so it defaults to the gpt-4 token counter) because tiktoken doesn't support OSS models and will error out when the --model flag is set to dolphin, etc..

cpacker · 2023-11-03T03:56:49Z

@vivi can you try /dump or /dumpraw afterwards?

That print statement needs to be removed (or only enabled on debug), but it's currently happening pre-request so it makes sense that it won't include the response / summary: https://github.com/cpacker/MemGPT/blob/localllm-summarize-fix/memgpt/local_llm/llm_chat_completion_wrappers/simple_summary_wrapper.py#L127

memgpt/local_llm/llm_chat_completion_wrappers/simple_summary_wrapper.py

vivi · 2023-11-03T06:24:13Z

@cpacker ah yeah, /dump shows that it's summarizing well enough:

🧑 {'type': 'system_alert', 'message': "Note: prior messages (12 of 17 total messages) have been hidden from view due to conversation memory constraints.\nThe following is a summary of the previous 12 messages:\n The AI has successfully booted up and initiated its persona. It greets a user named Chad and offers assistance. In response to questions about the user's name, the AI politely confirms that it knows him as Chad. When asked to search for the user's name in its archival memory, the AI reports that no results were found.", 'time': '2023-11-02 11:23:20 PM PDT-0700'}
💭 Searching archival memory for 'name' produced no results. Perhaps I should clarify my memory organization methods.
🧑 {'message': 'What is your name', 'time': '2023-11-02 11:22:53 PM PDT-0700'}
💭 User asks for my name. Time to introduce myself.
🧑 {'message': "Very cool Sam, I'm trying very hard to hit the token limit", 'time': '2023-11-02 11:23:13 PM PDT-0700'}
💭 Chad mentions he's attempting to stay within a token limit. I wonder how I can help him with that.

vivi

LGTM 🪴

52cs · 2023-11-03T09:30:17Z

Hmm... Seems that I found a bug again...
#280

* trying to patch summarize when running with local llms * moved token magic numbers to constants, made special localllm exception class (TODO catch these for retry), fix summarize bug where it exits early if empty list * missing file * raise an exception on no-op summary * changed summarization logic to walk forwards in list until fraction of tokens in buffer is reached * added same diff to sync agent * reverted default max tokens to 8k, cleanup + more error wrapping for better error messages that get caught on retry * patch for web UI context limit error propogation, using best guess for what the web UI error message is * add webui token length exception * remove print * make no wrapper warning only pop up once * cleanup * Add errors to other wrappers --------- Co-authored-by: Vivian Fang <[email protected]>

trying to patch summarize when running with local llms

3656b31

cpacker marked this pull request as draft October 31, 2023 06:10

cpacker changed the title ~~Patch summarize when running with local llms~~ [Draft] Patch summarize when running with local llms Nov 1, 2023

moved token magic numbers to constants, made special localllm excepti…

7572658

…on class (TODO catch these for retry), fix summarize bug where it exits early if empty list

cpacker commented Nov 2, 2023

View reviewed changes

missing file

8203245

cpacker commented Nov 2, 2023

View reviewed changes

cpacker added 3 commits November 2, 2023 16:37

raise an exception on no-op summary

3865c5b

changed summarization logic to walk forwards in list until fraction o…

38893ae

…f tokens in buffer is reached

added same diff to sync agent

4141920

cpacker requested a review from vivi November 3, 2023 00:32

cpacker marked this pull request as ready for review November 3, 2023 00:32

cpacker added 2 commits November 2, 2023 17:32

Merge branch 'main' into localllm-summarize-fix

b3ff108

reverted default max tokens to 8k, cleanup + more error wrapping for …

68087fe

…better error messages that get caught on retry

cpacker commented Nov 3, 2023

View reviewed changes

cpacker changed the title ~~[Draft] Patch summarize when running with local llms~~ Patch summarize when running with local llms Nov 3, 2023

cpacker and others added 2 commits November 2, 2023 18:03

patch for web UI context limit error propogation, using best guess fo…

f9b5c47

…r what the web UI error message is

add webui token length exception

08ceeb7

cpacker commented Nov 3, 2023

View reviewed changes

memgpt/local_llm/llm_chat_completion_wrappers/simple_summary_wrapper.py Outdated Show resolved Hide resolved

remove print

82b34c8

cpacker and others added 3 commits November 2, 2023 23:29

make no wrapper warning only pop up once

b3595ac

cleanup

43b62d5

Add errors to other wrappers

c9c47db

vivi approved these changes Nov 3, 2023

View reviewed changes

cpacker merged commit 31fd9ef into main Nov 3, 2023
2 checks passed

vivi mentioned this pull request Nov 3, 2023

This model's maximum context length is 4097 tokens for gpt3.5 #252

Closed

cpacker deleted the localllm-summarize-fix branch November 3, 2023 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch summarize when running with local llms #213

Patch summarize when running with local llms #213

cpacker commented Oct 31, 2023 •

edited

Loading

cpacker commented Nov 2, 2023 •

edited

Loading

cpacker Nov 2, 2023

cpacker Nov 2, 2023

cpacker Nov 3, 2023

cpacker commented Nov 3, 2023

cpacker commented Nov 3, 2023

vivi commented Nov 3, 2023 •

edited

Loading

cpacker commented Nov 3, 2023

vivi commented Nov 3, 2023

vivi left a comment

52cs commented Nov 3, 2023 •

edited

Loading

Patch summarize when running with local llms #213

Patch summarize when running with local llms #213

Conversation

cpacker commented Oct 31, 2023 • edited Loading

cpacker commented Nov 2, 2023 • edited Loading

cpacker Nov 2, 2023

Choose a reason for hiding this comment

cpacker Nov 2, 2023

Choose a reason for hiding this comment

cpacker Nov 3, 2023

Choose a reason for hiding this comment

cpacker commented Nov 3, 2023

cpacker commented Nov 3, 2023

vivi commented Nov 3, 2023 • edited Loading

cpacker commented Nov 3, 2023

vivi commented Nov 3, 2023

vivi left a comment

Choose a reason for hiding this comment

52cs commented Nov 3, 2023 • edited Loading

cpacker commented Oct 31, 2023 •

edited

Loading

cpacker commented Nov 2, 2023 •

edited

Loading

vivi commented Nov 3, 2023 •

edited

Loading

52cs commented Nov 3, 2023 •

edited

Loading