-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating MemGPT-like Functionality #2937
Conversation
return self.action_parser.parse(response) | ||
|
||
def condense( | ||
self, | ||
state: State, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move this .condense
function to the LLM
class, you can make messages
as the input argument. We might need to make each Message
a Pydantic class with an additional attribute (e.g., condensable
). This may also benefit the ongoing effort that tries to add vision capability in #2848 (comment) (cc @Kaushikdkrikhanu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done. But on moving .condense
to LLM
class, it won't be able to add summary to state.history
directly. I might also need to pass state
as arg to llm.completion
along with messages.
If we are trying to make it cleaner (#2937 (comment)), I instead was thinking of moving .condense
to MemoryCondenser
class, after making each Message
a pydantic class. That's where I think .condenser
should belong 😅.
WDYT? Which approach seems better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh good point... maybe caller to LLM
should make sure the summary got added to the State
- like what we are doing now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought: maybe State
is not the place to add these new summarizing actions - EventStream
should be: https://github.com/OpenDevin/OpenDevin/blob/c555fb68400bb117011cda3e5d3d95beb5169000/opendevin/controller/agent_controller.py#L395
Maybe you could add an optional argument for LLM
class, like condense_callback_fn
, which will be called with the condensed Summarized
action whenever a condensation is happening. And that callback will add this Summarize
action to the event stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this? How does adding summarizing actions to EventStream help?
MemGPT stores only the summary of events from the start to some event.id
. We only need to store one summary in the history. A summary event remains useful until we create a new summary, after which it becomes useless.
Btw, I've moved the .condenser
functions to LLM
class. MemoryCondenser
class used methods of LLM
class, so I completely removed it and moved one of its method to LLM
class. This also achieves the target (#2937 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, going forward, we hope to make EventStream
a single source of truth for all our past message / history (i.e., you get all the history messages from the event stream), so ideally all your condensation actions also should goes into event stream.
But base on my understanding, your current implementation already emit "AgentSummarizeAction"? I think we just need to make sure that action got stored inside eventstream, we should be good to go on this bullet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xingyaoww Oh interesting, I played with summaries in the event stream at some point. A couple of thoughts:
- are they 'real events'? That is, if emitted with the same mechanism as others (which they currently were not), would they be propagated downstream to components listening
- or, since they replace during runtime the real events they summarize, are they more like 'shadow events'? In which case they're not the truth, but a fleeting version of it, just necessary due to some runtime constraints; the source of truth holds the real events and is able to retrieve them when needed.
- they need to be saved I assume, even if it's one at a time, to retrieve at a disconnect or restart/restore etc. Do we save them along with other events or we save them in State? If we save them in the eventstream, then we may want to consider moving some filtering logic too.
On a side note, making it only one in a run simplifies the logic either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better overall! But i think there are still a few places that we can clean up :)
@khushvind This is amazing work, thank you! We definitely need something like this, and it's a smart algo. I'm sorry I didn't get to look into it very well yet, but I did give it a try, and it seems there are a few issues. If you want, for replicating, I can give you the event stream I used, or, my scenario was something like: use a model with a relatively large context window (I used get-4o with 128k), make sure sessions are enabled[1], do stuff, then when the history gets to some 11k, stop it, config 4k in This will provoke it not only to apply summarize, but also do it consecutively before getting to a regular prompt, and it seems to behave unexpectedly in part, it loses information. Also, it doesn't seem to save the summary (my run ended with an error, but I'm not sure I see in the code where it saves them, even if it ends well?) I can play with it more tomorrow I hope. Thank you for this! [1] If you use opendevin via the UI, save |
Some thoughts, since we're at it: I think there are two major areas which have affected negatively performance in previous work.
FWIW in the swe-bench evals on PR 2021, I saw some tasks where the LLM was just restarting, as if it knew nothing. It wasn't using the summary! I think this was due to both prompt and recall. Since this PR is condensing 75% of the messages, leaving a few at the end, I think we won't see that anymore. Nevertheless, TBH if we still don't include recall, we do need to fix the prompt very well... I found some decent tasks on swe-bench for this, I'm going to retrieve them and run them on this PR. I think I'll also add some recall and see how it goes then. |
@khushvind Thanks for your amazing work -- Let me know if this is ready for review again! I have a couple of ideas regarding how we can develop this:
|
} | ||
} | ||
Make sure to include in observations any relevant information that the agent needs to remember. | ||
%(events)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a placeholder in the template string, for a key events
to replace it. The code can use it like
events_dict = {'events': 'the data in summary input'}
formatted_string = SUMMARY_PROMPT_SYSTEM % events_dict
We can do something similar here, or we can remove this placeholder.
@xingyaoww I am still to run some evaluations on swe-bench to check if it is working fine on different examples. And also need to incorporate some changes that @enyst suggested. I'll let you know once it's done. I'll take a look at these benchmarks by the weekend. |
The
MemGPT's prompt also just asks the LLM to summarize the messages without asking it specifically to keep the info about what files shouldn't be checked. Maybe we can refine ours. I already included some part of MemGPT's prompt, that seemed useful, in current implementation. |
Just to note, I played a little with it on some tasks on SWE-bench, and so far:
I'm traveling today and tomorrow, we'll see soon though.
Maybe, but afaik the version in PR 2021 was almost identical without a word limit, and it didn't seem enough on SWE-bench tasks (otherwise in interactive sessions it works, since the user will nudge it). Worth trying though.
That's an interestingly simple prompt! To clarify, the reason I say files is that on SWE bench, the LLM appeared to repeat the attempts to work with the wrong files like it had done before, after it figured out the right files. 🤔 Maybe it was a fluke... |
- "action": "summarize" | ||
- args: | ||
- "summarized_actions": A precise sentence summarizing all the provided actions, written in the first person. | ||
- "summarized_observations": A few precise sentences summarizing all the provided observations, written in the third person. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: what's the motivation behind "first person" for action and "third person" for observations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to do some work on improving the prompt. This was adapted from #2021. From what I felt, actions are performed by the agent, and it would make more sense to have "first person" for the action summary as this gets passed to the agent in the prompt. Similarly "third person" for observation.
I didn't pay much attention to it that time, but now that you've pointed it out, maybe we can avoid specifying the person in the new prompt, as this summary message has role = 'user'
when passed to the llm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition is it doesn't matter at all, thus the ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeAct prompt is structured as a dialogue, one message from user, one from agent pattern. The former version was splitting this result in 2 messages sent to the LLM, following the same pattern. It went like this:
role='user'
...The task is ...
role='agent'
For this task, I did this and I did that and the other...
# action-like
role= 'user'
The code was ... and the assistant had....
# obs-like
In this PR, we're sending them both in the same role='user' message. So it should probably be all like an obs. It will also be merged into the user task, e.g. it will end up like:
role='user'
The task is...
The assistant did this and that...
The code was... and...
This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This PR was closed because it has been stalled for over 30 days with no activity. |
What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?
Fix #1748
This PR is proposing an implementation integrating MemGPT-like functionality to OpenDevin's ShortTermMemory inspired by issue #2487. It also uses ideas from #2021.
Give a brief summary of what the PR does, explaining any non-trivial design decisions
The memory is summarized (condensed) when the token count approaches the context window limit but still hasn't hit the limit. All the events in the event stream are considered to be summarizable, except the system and example messages. When the condenser function is called, it tries to summarize events such that at least the first 0.75 fractions of the total tokens get summarized. After this, the summary is added to history. Events are summarized in the order of their occurrence, i.e., initial events are summarized first. The word limit for the summary is set to <=200 by default.
Messages are passed to llm in this order → (system message + example message + summary + all events after the last summarized event)
Currently, this functionality is only implemented for CodeAct Agent.