-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch summarize when running with local llms #213
Conversation
@vivi Potential issues with summarization and local models - it's important for the local models to have in-context examples of messages being sent. In this example, the summarize + truncation leads to no examples of send_message being present in the prompt string that gets sent to the local LLM:
The LLM then returns this response:
Likely because there's no example We can probably fix this by adding some hardcoded rules to ensure that as a result of summarization we always have at least one assistant message, and one user message. However, this may also be overindexing on the fact that I'm testing this on a 4k context limit environment, and this is not a problem for 8k. Larger log:
|
…on class (TODO catch these for retry), fix summarize bug where it exits early if empty list
@@ -154,7 +154,7 @@ async def a_summarize_messages( | |||
trunc_ratio = (MESSAGE_SUMMARY_WARNING_TOKENS / summary_input_tkns) * 0.8 # For good measure... | |||
cutoff = int(len(message_sequence_to_summarize) * trunc_ratio) | |||
summary_input = str( | |||
[await summarize_messages(model, message_sequence_to_summarize[:cutoff])] + message_sequence_to_summarize[cutoff:] | |||
[await a_summarize_messages(model, message_sequence_to_summarize[:cutoff])] + message_sequence_to_summarize[cutoff:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General bug being patched here
try: | ||
function_name = function_json_output["function"] | ||
function_parameters = function_json_output["params"] | ||
except KeyError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes should be put in the other wrappers too
…better error messages that get caught on retry
@@ -16,8 +16,18 @@ | |||
INITIAL_BOOT_MESSAGE_SEND_MESSAGE_FIRST_MSG = STARTUP_QUOTES[2] | |||
|
|||
# Constants to do with summarization / conversation length window | |||
MESSAGE_SUMMARY_WARNING_TOKENS = 7000 # the number of tokens consumed in a call before a system warning goes to the agent | |||
# The max amount of tokens supported by the underlying model (eg 8k for gpt-4 and Mistral 7B) | |||
LLM_MAX_TOKENS = 8000 # change this depending on your model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vivi you can test that this works by intentionally lowering LLM_MAX_TOKENS + lowering it on the web UI backend
@vivi FYI, https://github.com/cpacker/MemGPT/blob/main/memgpt/local_llm/lmstudio/api.py#L37-L52 This then gets "rewritten" into the same verbiage as the OpenAI context length error, so that MemGPT can catch up further up the stack: https://github.com/cpacker/MemGPT/blob/main/memgpt/agent.py#L673 However, this needs to be added to |
…r what the web UI error message is
Is this right? Running at LLM_MAX_TOKENS=3000 on webui, summary seems to be empty?:
Also I made count_tokens inside summarize not specify a model (so it defaults to the gpt-4 token counter) because tiktoken doesn't support OSS models and will error out when the |
@vivi can you try That print statement needs to be removed (or only enabled on debug), but it's currently happening pre-request so it makes sense that it won't include the response / summary: https://github.com/cpacker/MemGPT/blob/localllm-summarize-fix/memgpt/local_llm/llm_chat_completion_wrappers/simple_summary_wrapper.py#L127 |
memgpt/local_llm/llm_chat_completion_wrappers/simple_summary_wrapper.py
Outdated
Show resolved
Hide resolved
@cpacker ah yeah,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🪴
Hmm... Seems that I found a bug again... |
* trying to patch summarize when running with local llms * moved token magic numbers to constants, made special localllm exception class (TODO catch these for retry), fix summarize bug where it exits early if empty list * missing file * raise an exception on no-op summary * changed summarization logic to walk forwards in list until fraction of tokens in buffer is reached * added same diff to sync agent * reverted default max tokens to 8k, cleanup + more error wrapping for better error messages that get caught on retry * patch for web UI context limit error propogation, using best guess for what the web UI error message is * add webui token length exception * remove print * make no wrapper warning only pop up once * cleanup * Add errors to other wrappers --------- Co-authored-by: Vivian Fang <[email protected]>
Moving these to a separate PR: