Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding solution for select_speaker and agent_by_name functions: Sterilization or ?? #489

Closed
robzsaunders opened this issue Oct 30, 2023 · 11 comments
Assignees
Labels
group chat/teams group-chat-related issues

Comments

@robzsaunders
Copy link

Hey everyone,

Was doing some probing into why the group manager just fails to do its job and have some questions.

  1. on lines 153 to 156, why are we broadcasting the message to all agents?

def run_chat(
self,
messages: Optional[List[Dict]] = None,
sender: Optional[Agent] = None,
config: Optional[GroupChat] = None,
) -> Union[str, Dict, None]:
"""Run a group chat."""
if messages is None:
messages = self._oai_messages[sender]
message = messages[-1]
speaker = sender
groupchat = config
for i in range(groupchat.max_round):
# set the name to speaker's name if the role is not function
if message["role"] != "function":
message["name"] = speaker.name
groupchat.messages.append(message)
# broadcast the message to all agents except the speaker
for agent in groupchat.agents:
if agent != speaker:
self.send(message, agent, request_reply=False, silent=True)
if i == groupchat.max_round - 1:
# the last round
break
try:
# select the next speaker
speaker = groupchat.select_speaker(speaker, self)
# let the speaker speak
reply = speaker.generate_reply(sender=self)
except KeyboardInterrupt:
# let the admin agent speak if interrupted
if groupchat.admin_name in groupchat.agent_names:
# admin agent is one of the participants
speaker = groupchat.agent_by_name(groupchat.admin_name)
reply = speaker.generate_reply(sender=self)
else:
# admin agent is not found in the participants
raise
if reply is None:
break
# The speaker sends the message without requesting a reply
speaker.send(reply, self, request_reply=False)
message = self.last_message(speaker)
return True, None

Instead of just sending the message to the selected speaker?

many server queue incoming messages. Personally I've been using LM Studio and noticed that it notes "running queued message" and runs them one by one, which may be causing some of these Local LLM group chat complications despite hacks to force chat orders.

@robzsaunders
Copy link
Author

robzsaunders commented Oct 30, 2023

I'm currently checking to see if theres better performance by commenting out lines 154,155 and 156

# broadcast the message to all agents except the speaker
for agent in groupchat.agents:
if agent != speaker:
self.send(message, agent, request_reply=False, silent=True)

and adding this between 162 and 164
self.send(message, speaker, request_reply=False, silent=True)

@afourney
Copy link
Member

afourney commented Oct 30, 2023

Group Chat is designed to mimic ... well a group chat (e.g., on your phone or in slack). We expect every agent to be aware of the shared conversation up to that point. Other workflows are certainly possible (Like spoke and hub delegation), but they wouldn't be called group chat. Commenting out the broadcast will likely break many of the Group Chat demo scenarios.

When you say that you were investigating why the Group Chat Manager "fails to do its job", can you be more specific? What failures were you observing?

@afourney afourney added the group chat/teams group-chat-related issues label Oct 30, 2023
@robzsaunders
Copy link
Author

Hey @afourney, thanks for the explanation. Makes sense!

Apologies for the long write up but I think I found the source of a lot of local LLM's group chat problems. Since there's no error outputs for when there's failures, no one knew that the responses from the manager were incorrect.

Taking a look through the issue board and discord, there seems to be a common theme over the last week or so of the group chat manager not working in the "correct order" or "selecting agents correctly".

After doing some debugging using a few local LLM's I think I found the issue. It is a pair of problems that are related to one another.


Problem 1

Example: Agent being called is named "Coder"

Similar to my ticket #399 where we needed to sterilize the input of raw code, the raw output of the manager's role message is not in the correct format for the agent_by_name function in groupchat.py.

The below is called from the end of select_speaker function.

def agent_by_name(self, name: str) -> Agent:
"""Find the next speaker based on the message."""
return self.agents[self.agent_names.index(name)]

Example outputs from the manager:

  • "The role I select is Coder"
  • "Coder:"
  • "```Coder"

The major offense though which I repeatedly keep seeing is the manager respond

  • "Coder: >>Writes out all the code<<"

What that function expects

  • Coder

This causes the ValueError to trigger on line 106 causing pseudo correct looking functionality. My belief is that because the ValueError spillover calls next, the manager seems to be working because I suspect most people order their agents in the chatgroup in the logical order of operations.


Rolling into problem 2

For some reason, I'm not quite sure why, the manager is semi-ignoring the system prompts fed to it from select_speaker_msg and line 96 in groupchat.py.

Now I don't know for sure how it works yet 100%, but my intuition is that the self.messages on line 96 needs to be a user message and the prompt from the user needs to be a system message for the manager.

Before I finished up today, the last thing I did was swap line 95 from being a system tag to a user tag, and the manager stopped writing out full blocks of code.

It didn't output the correct single word response of "Coder" but it went "I will choose the coder" or something like that. So there is something weird going on with how the local LLM's interact with the group chat manager prompts.

I'm not sure the best way to approach this, but I think this is a another roadblock for most local LLM users and will help with the group chat problems they're facing.


Current behavior

4 agents, Manager, User_Proxy, Coder, QA

  1. User proxy
    "Hey write me a basic hello world in python" (to manager)

  2. System
    [ "Choose a role, only choose one role" ] (to manager)

  3. Manager
    "I choose Coder: '''Python (insert python code here)"

  4. autogen logic
    [ Manager has finished and I got it's response. Checking message. Result: ValueError fail. Last agent: Coder. Next agent is... Coder ]

  5. Coder
    " '''Python (insert python code with incorrect code here) " (replies to autogen logic)

  6. autogen logic
    [ coder is done, manager picks another speaker]

  7. System
    [ "Choose a role, only choose one role" ] (to manager)

  8. Manager
    "I choose QA: "(Does full QA assessment that detects the incorrect code)

  9. autogen logic
    [ Manager has finished and I got it's response. Checking message. Result: ValueError fail. Last agent: Coder. Next agent is... QA']

  10. QA
    [ Does full assessement that detects the incorrect code] (replies to autogen logic)

  11. autogen logic
    [ QA is done, manager picks another speaker]

  12. System
    [ "Choose a role, only choose one role" ] (to manager)

  13. Manager
    "I choose Coder: '''Python (insert python code here)"

  14. autogen logic
    [ Manager has finished and I got it's response. Checking message. Result: ValueError fail. Last agent: QA. Next agent is... User_Proxy']

  15. User Proxy
    >>>Executes Code

Disclosure : I haven't done any testing using OpenAI's GPT. This may or may not be a problem for GPT 3.5/4

@afourney
Copy link
Member

afourney commented Oct 31, 2023

Thanks for the awesome deep dive. Keep them coming!

We've not done a lot of testing with local models, and i actually have no idea how well we should expect a GroupChatManager to function if backed by such models. One thing to try is to leave the Chat Manager as GPT-4, but local LLMs everywhere else, and just compare performance.

Selection of the next agent is non-trivial, and frankly I'm surprised it works even with GPT-4. Here's one possible source of confusion: #319

@dogukanustun
Copy link

Hello,

I am facing a similar issue. When I tried agent_by_name(), I also got ValueError but interestingly my example outputs are not like given below (taken from robzsaunders)

Example outputs from the manager:

"The role I select is Coder"
"Coder:"
"```Coder"

What I get for name parameter is the whole output of the LLM.

I am in desperate situation and open to every solution.

Thanks.

@robzsaunders
Copy link
Author

Yea that's what I was trying to communicate with:

The major offense though which I repeatedly keep seeing is the manager respond

"Coder: >>Writes out all the code<<"

The manager does the work first but it is silent in the background and throws the role prompt through a loop

@robzsaunders robzsaunders changed the title Questions about groupchat.py (Local LLM Related) Finding solution for select_speaker and agent_by_name functions: Sterilization or ?? Nov 1, 2023
@robzsaunders
Copy link
Author

This PR is related to this, offering a partial solution.

#500

@afourney
Copy link
Member

afourney commented Nov 1, 2023

I think there are a few things going on here, and I think we need a piecemeal approach to solving it. The PR #500 would solve the problem if you don't need dynamic orchestration. In other words, if you already know which agents should speak, and in what order, then Group Chat -- as it currently stands -- is not a great solution, and some deterministic alternative would be better.

However, if you still want dynamic orchestration, then what we need to do is improve the GroupChatManager's performance on local models. We can do this a few ways. First, we can try improving the prompt that it uses, or perhaps use a different prompt altogether, tuned to the local model. Alternatively we can improve parsing (or recognize the failure and remind the model to output the correct format, similar to TypeChat). This would add some robustness to the selection.

I would, however, openly wonder about effectiveness. Orchestration is super complex problem, and it resembles planning. If the underlying LLM can't handle the instructions to output the correct format, I might naturally wonder how carefully considered its plans are?

@robzsaunders
Copy link
Author

Yea, its why I opened this as an issue instead of knee jerking a solution with a PR.

The orchestration needs some discussion.

I'm still convinced that something isn't being sent through properly to the local model (I'm using LM Studio) since editing the managers system prompts don't do anything

@SoheylM
Copy link

SoheylM commented Nov 3, 2023

Hi there,

I am also working on it, serving Mistral_7B with vLLM using the openai api end points.

**** EDIT ****
llm_config gets passed along with via kwargs, so first problem is already addressed. The second problem seems linked to to serving Mistral 7B Instruct with vLLM. It may be related to the required prompt template.


[FIXED] The first problem I notice with respect to running a local LLM using the llm_config dictionary, and correct me if I am wrong, is its absence in the GroupChatManager constructor. Unless overwriting the DEFAULT_MODEL to be the local LLM, or pointing 'gpt-4', 'gpt-3.5-turbo' etc. model names to the local LLM, I believe two lines of code need to be added. These are to initialize the llm_config dictionary and pass it along to the Parent class, ConversableAgent:

'class GroupChatManager(ConversableAgent):

def __init__(
    self,
    groupchat: GroupChat,
    name: Optional[str] = "chat_manager",
    # unlimited consecutive auto reply by default
    max_consecutive_auto_reply: Optional[int] = sys.maxsize,
    human_input_mode: Optional[str] = "NEVER",
    system_message: Optional[str] = "Group chat manager.",
    llm_config: Optional[Union[Dict, bool]] = None,
    # seed: Optional[int] = 4,
    **kwargs,
):
    super().__init__(
        name=name,
        max_consecutive_auto_reply=max_consecutive_auto_reply,
        human_input_mode=human_input_mode,
        system_message=system_message,
        llm_config=llm_config,
        **kwargs,
    )
    self.register_reply(Agent, GroupChatManager.run_chat, config=groupchat, reset_config=GroupChat.reset)
    # self._random = random.Random(seed)'

The second problem specific to my implementation is in-line with @dogukanustun 's comment. The GroupChatManager never outputs anything related to a role. Instead it seems to answer directly the question asked even though the role mentioned is Planner, Engineer etc. The first agent that seems to answer is always the first of the list, followed by the second one and so on in order to the last. Then I swaped the positions of the agents in the groupchat. The answers provided were exactly the same, meaning they are position dependent and not agent dependent. The first, second answer etc. are always the same, the name of the agent answering appearing in terminal changes and matches the ordering of the groupchat list.

If my second problem is solved and I reach @robzsaunders problem, I think I could offer some solutions. One would be through prompt engineering to force the local LLM to spit out the correct role based on the groupchat list.

@tevslin
Copy link

tevslin commented Jan 29, 2024

would like to see whatever is implemented for groupchatmanager be exposed in autogen studio

@gagb gagb closed this as completed Aug 27, 2024
jackgerrits added a commit that referenced this issue Oct 2, 2024
…nc (#489)

* Port docker code executor, make async, make code executor restart async

* add export

* fmt

* fix async file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
group chat/teams group-chat-related issues
Projects
None yet
Development

No branches or pull requests

6 participants