Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance chat memory #226

Open
edwardzjl opened this issue Dec 25, 2023 · 5 comments
Open

Enhance chat memory #226

edwardzjl opened this issue Dec 25, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request python Pull requests that update Python code

Comments

@edwardzjl
Copy link
Owner

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Enhance chat memory by long-term memo (summary), searched memo (vector retriever) and short-term memo (buffer window, current solution), and combine them together (CombinedMemory).

@edwardzjl edwardzjl added the enhancement New feature or request label Dec 25, 2023
@edwardzjl edwardzjl self-assigned this Dec 25, 2023
@edwardzjl
Copy link
Owner Author

Before utilizing any token-counting memory, such as langchain.memory.ConversationSummaryBufferMemory, customization is required.

These memories rely on the functionality of langchain_core.language_models.base.BaseLanguageModel.get_num_tokens_from_messages to calculate the input token length. However, this calculation is contingent on langchain_core.messages.get_buffer_string, which may not accurately represent the string in certain scenarios, such as when using chat templates like chatml.

Additionally, the langchain_core.language_models.base.BaseLanguageModel.get_num_tokens_from_messages function invokes langchain_core.language_models.base.BaseLanguageModel.get_token_ids, which defaults to GPT2TokenizerFast.from_pretrained("gpt2"). It's important to note that this default setting could lead to inaccuracies if I'm using another LLM instead of ChatGPT.

@edwardzjl
Copy link
Owner Author

I digged a little into vector based memories, here's some pickups:

  • There's a default implementation langchain.memory.VectorStoreRetrieverMemory. (doc)
  • However, the default implementation lacks support for chat memory.
    • chat memories have a return_messages attibute that controls whether you load history into str or BaseMessages.
    • VectorStoreRetrieverMemory have a return_docs attibute that controls whether the history is loaded into str or Documents.
  • What's more, it's difficult (although possible) to separate VectorStoreRetrieverMemorys into different sessions.
  • There's a langchain.memories.CombinedMemory that can be used to combine multiple memories into chain or agent. (doc)
  • However, if I combine VectorStoreRetrieverMemory with ConversationBufferWindowMemory, the chat history will be persist twice.
    • Once in redis vectorstore, once in redis chat history
  • I think it's possible and intuitive to customize one memory implementation, instead of using CombinedMemory, to combine these two behaviours:
    • vectorstore memory as long-term memory
    • buffer window memory as short-term memory

@edwardzjl
Copy link
Owner Author

What about a background task (like kubernetes CronJob) to read all memories, and put those "old ones" into a vectorstore?

  • If I insert them into vectorstore then delete them from buffered window (backed by a list), I might need to add transactions.
  • If I insert them into vectorstore and leave them in the buffered window, there's more problems:
    • There's potentially huge disk waste.
    • I might need to deal with duplicate records (which might be solved by langchain index).
    • The cron job may take very long to finish if the user and history grows.

Neither is simple to implement.

@edwardzjl
Copy link
Owner Author

A chatbot is not like a human been. Humans don't inherently retain conversation message orders; instead, we rely on short-term memory, which eventually consolidates information into a long-term memory where order becomes less crucial.

In contrast, a chatbot must persistently maintain all messages in a list-order, primarily for later user display.

Additionally, it's worth noting that Redis indexing is limited to hash or JSON structures, not lists. I cannot simply add embedding index on list of chat messages.

To address the need for both long-term and short-term memory while preserving the correct order of messages for user display, a solution involves duplicating the messages. One copy is stored in a Redis list, and another is stored in a hash with an embedded index.

An alternative approach might involve storing only the list index in the document, as shown below:

{
  "msg_embedding": [],
  "msg_idx": 0
}

Fetching messages with context can then be achieved by accessing the message list using LRANGE, which has an acceptable time complexity of O(S+N) (LRANGE).

However, it's important to note that the current behavior of langchain's langchain_community.chat_message_histories.redis.RedisChatMessageHistory, which stores messages in reverse order using LPUSH (placing the latest message at the beginning of the list), necessitates a modification. This adjustment is crucial to maintain consistency, as each new message alters the index of all existing messages.

@edwardzjl edwardzjl added the python Pull requests that update Python code label Jan 11, 2024
@edwardzjl
Copy link
Owner Author

langchain's memory system is under refactoring, maybe I will wait some more weeks until it's getting stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python Pull requests that update Python code
Projects
None yet
Development

No branches or pull requests

1 participant