Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactorization of ConversableAgent to unify async and sync code and better extensibility #1240

Closed
wants to merge 99 commits into from

Conversation

davorrunje
Copy link
Collaborator

@davorrunje davorrunje commented Jan 14, 2024

Note: this is a refactorization study and will change a lot as we explore different design patterns and how they fit together in the framework. Feedback and suggestions are more than welcome.

Why are these changes needed?

The current state.

Several issues and PRs involve how to extend the ConversableAgent class. Effort like Teachability for every agent (#534) and the concept of modularized agent capability is a big step toward solving this from a higher level. There are still low-level extension issues such as:

While some patches have been made to address those issues but not the root cause, which is that the ConversableAgent was mostly designed to work in a console environment but not yet a server-side library, and the code duplication in async and sync public methods.

What is this PR for.

The main goal of this PR is to address the above and at the same time introducing a design pattern that would make it much easier to add low-level functionalities like logging, content filtering, RAG-style context expansion, and custom termination mechanism and many others. We also want to demonstrate that high-level capabilities like Teachability can be composed of re-usable low-level components.

What is this PR NOT for:

This PR is not for breaking the existing ConversableAgent methods. The ConversableAgent has many great methods like the register_for_llm and initiate_chat that are loved by users.

What can you do to help:

This is not a small change, so we want to have as many feedback AND HELP as possible. We will post some work items on this page as we move forward, and you are welcome to contribute!

Update 01/16/2024

@ekzhu and I came up with the new scheme which introduces a single new function/decorator hookable and implements Middleware pattern. An example of how to use it is as follows:

class A:
    def __init__(self, name: str) -> None:
        self.name = name

    @hookable
    def go(self, *args: Any, **kwargs: Any) -> str:
        return f"{self.name}.{format_function(self.go, *args, **kwargs)}"

class MyMiddleware:
    def __init__(self, name: str) -> None:
        self.name = name

    def call(self, *args: Any, next: Callable[..., Any], **kwargs: Any) -> str:
        retval = next(*args, **kwargs)
        return f"{self.name}.{format_function(self.call, retval)}"

    def trigger(self, *args: Any, **kwargs: Any) -> bool:
        return not ("skip_middleware" in kwargs and kwargs["skip_middleware"])

a = A("a")
add_middleware(A.go, MyMiddleware("mw"))

assert a.go(1, 2, 3, a=4, b=5) == "mw.call(a.go(1, 2, 3, a=4, b=5))"
assert a.go(1, 2, 3, a=4, b=5, skip_middleware=False) == "mw.call(a.go(1, 2, 3, a=4, b=5, skip_middleware=False))"
assert a.go(1, 2, 3, a=4, b=5, skip_middleware=True) == "a.go(1, 2, 3, a=4, b=5, skip_middleware=True)"

add_middleware(A.go, MyMiddleware("MW"))
assert a.go(1, 2, 3, a=4, b=5) == "mw.call(MW.call(a.go(1, 2, 3, a=4, b=5)))"

There can be more than one hookable method in each class. We can use this to implement reply and hook functions and probably many other things.

Update 01/17/2024

@tyler-suard-parker @joshkyh @bitnom @jackgerrits @rickyloynd-microsoft

You are welcome to try out this branch. We are currently working on replacing register_hook for now then we will move on to refactor existing generate_***_reply functions in the ConversableAgent class into a middleware and upgrade the generate_reply method to use the middleware -- so the current functionalities stay the same.

But we need someone to think about how to implement some of these new features using middleware, and add them to the generate_reply to enable new functionalities.

  1. Incoming and outgoing message streaming to web socket
  2. Incoming and outgoing message logging
  3. RAG-style message context expansion, e.g., retrieve relevant context from a vector database and expand the incoming message's content.
  4. Human middleware to short-circuit the rest of the middleware pipeline -- think a static file middleware in a web framework.
  5. Out-going message filtering. e.g., filtering api keys, passwords, etc.

Here is a simple example of logging middleware that logs incoming and outgoing messages.

class LoggingMiddleware:
  def call(self, message: Dict, next: Callable[[Dict], Dict]) -> Dict:
    logging.info(f"Incoming: {retval}")
    retval = next(message)
    logging.info(f"Outgoing: {retval}")
    return retval

Here is another for filtering out OpenAI API keys.

class FilterAPIKeyMiddleware:
  def call(self, message: Dict, next: Callable[[Dict], Dict]) -> Dict:
    retval = next(message)
    if retval.get("content", False):
      re.sub(r'(sk-\w{4})\w+', r'\1***', retval["content"])
    return retval

Another one for simple RAG-style context expansion:

class RAGMiddleware:
  def call(self, message: Dict, next: Callable[[Dict], Dict]) -> Dict:
    if message.get("content", False):
      # Expand the message content with some text retrieved from vector db.
      expansion = vectordb.search(message["content"], k=1)[0]
      message["content"] += expansion
    retval = next(message)
    return retval

We are also adding a decorator that would convert a function into a middleware, saving user the effort to write a class.

More updates 01/17/2024

Teachability is refactored using the Middleware pattern instead of hooks. This is the actual implementation right now:

def add_to_agent(self, agent: ConversableAgent):
    """Adds teachability to the given agent."""
    self.teachable_agent = agent

    # Register a middleware for processing the last message.
    class ProcessLastMessageMiddleware:
        def __init__(self, *, agent: ConversableAgent, teachability: Teachability):
            self.teachability = teachability
            self.agent = agent

        def call(self, agent: ConversableAgent, user_text: str, *, next: Callable[[str], str]):
            user_text = next(agent, user_text)
            return self.teachability.process_last_message(user_text)

        def trigger(self, agent: ConversableAgent, user_text: str):
            return self.agent == agent

    add_middleware(
        ConversableAgent.process_last_message_user_text,
        ProcessLastMessageMiddleware(agent=agent, teachability=self),
    )

Whenever ConversableAgent.process_last_message_user_text is called, the ProcessLastMessageMiddleware.call is invoked and a wrapper to the original ConversableAgent.process_last_message_user_text is passed as the next parameter. All typing hints here are optional, they are present only to help understand what the expected parameters are.

There is some cleanup and error handling remaining, but this is basically it. As Erik mentioned above, it is easy to write a set of standard MIddleware that covers the most common use cases.

Update 01/18/2024

Middleware registration methods add_middleware and set_middleware are refactored to be attached to bounded methods as suggested by @LittleLittleCloud and @ekzhu.

class A:
    def __init__(self, name: str) -> None:
        self.name = name

    @register_for_middleware
    def process_message(self, msg: str, skip_middleware: Optional[bool] = None) -> str:
        return f"{self.name}.process_message({msg=})"

class MyMiddleware:
    def __init__(self, name: str) -> None:
        self.name = name

    def call(self, *args: Any, next: Callable[..., Any], **kwargs: Any) -> str:
        retval = next(*args, **kwargs)
        return f"{self.name}.{format_function(self.call, retval)}"

    def trigger(self, *args: Any, **kwargs: Any) -> bool:
        return not ("skip_middleware" in kwargs and kwargs["skip_middleware"])

a = A("a")
mw = MyMiddleware("mw")

# middleware attached to a bounded method a.process_message
add_middleware(a.process_message, mw)

assert a.process_message("hello") == "mw.call(a.process_message(msg='hello'))"

mw2 = MyMiddleware("mw2")
add_middleware(a.process_message, mw2)
assert a.process_message("hello") == "mw.call(mw2.call(a.process_message(msg='hello')))"

b = A("b")
with pytest.raises(ValueError):
    # mw is already added to a.process_message
    add_middleware(b.process_message, mw)

mwb = MyMiddleware("mwb")
add_middleware(b.process_message, mwb)

# only mwb middleware is called on calling b.process_message
assert b.process_message("hello") ==  "mwb.call(b.process_message(msg='hello'))"

# only mw and mw2 are called on a.process_message
assert a.process_message("hello") ==  "mw.call(mw2.call(a.process_message(msg='hello')))"

Update 01/19/2024

The code was internally refactored so it accurately uses the signature of a decorated function in call() methods of a MIddleware class. Another change is adding a_call method to Middleware classes and removing of trigger method. Having both call() and a_call allows for the most efficient implementation. Decorators for automatically generating call from a_call and vice versa will be added shortly so we'll be still able to mix sync/async styles if we are willing to pay the price in reduced performance. Examples of Middleware classes were added to the ConversableAgent. Here is a simple one performing logging:

# notice that the `call`` signature must match the function decorated with `register_for_middleware`:
# passing arguments to call() functions must the the same as passing arguments
# to generate_reply() apart from next being passed as a keyword argument
# default values must also be the same
class _PrintReplyMiddleware:
    def __init__(self, agent: Agent):
        self._agent = agent

    def call(
        self,
        messages: Optional[List[Dict]] = None,
        sender: Optional[Agent] = None,
        # next will be passed as a keyword argument
        next: Optional[Callable[..., Any]] = None,
    ) -> Tuple[bool, Optional[str]]:
        print(f"generate_reply() called: {sender} sending {messages[-1] if messages else messages}'")
        retval = next(messages, sender)
        return retval
    
    async def a_call(
        self,
        messages: Optional[List[Dict]] = None,
        sender: Optional[Agent] = None,
        next: Optional[Callable[..., Any]] = None,
    ) -> Tuple[bool, Optional[str]]:
        print(f"a_generate_reply() called: {sender} sending {messages[-1] if messages else messages}'")
        retval = await next(messages, sender)
        return retval

class ConversableAgent(Agent):
    def __init__(self, *args, **kwargs):
        ...
        # attaching middleware to a registered method 
        add_middleware(self.generate_reply, _PrintReplyMiddleware(self))
        add_middleware(self.a_generate_reply, _PrintReplyMiddleware(self))

    @register_for_middleware
    def generate_reply(
        self,
        messages: Optional[List[Dict]] = None,
        sender: Optional[Agent] = None,
        exclude: Optional[List[Callable]] = None,
    ) -> Union[str, Dict, None]:
        ...

    @register_for_middleware
    async def a_generate_reply(
        self,
        messages: Optional[List[Dict]] = None,
        sender: Optional[Agent] = None,
        exclude: Optional[List[Callable]] = None,
    ) -> Union[str, Dict, None]:
        ...

All the tests are passing and we are ready to start refactoring the ConversableAgent class.

Update 01/20/2024

Created the following middleware:

  1. ToolUseMiddleware
  2. LLMMiddleware
  3. CodeExecutionMiddleware
  4. TerminationAndHumanReplyMiddleware
  5. MessageStoreMiddleware
  6. TeachabilityMiddleware

See autogen/middleware and contrib/capability/teachability.py

Refactored ConversableAgent by composing it using the middleware above. All public methods are backward-compatible.

Fixed some tests. The failing tests should be easy to fix.

Next step:

  1. Use wrapper to unify sync and async code path.
  2. Utilities for building middleware chain and validating call(...) method signatures.
  3. Fix all tests.
  4. Update code-level documentation, remove recommendation for subclassing.

Update 01/26/2024

Async/sync mixing works now in all cases

  • Function and tool calling is working in all combinations of async/sync calls.
  • Code execution works in a_initialize_chat now.

Quality improvements

All tests are passing now. Code coverage was significantly improved with the goal of having over 90% code covered by tests. Type annotations are fixed and mypy reports no errors in autogen/agentchat/middleware and test/agentchat/middleware folders.

autogen/agentchat/middleware/base.py                                6      0      0      0   100%
autogen/agentchat/middleware/code_execution.py                    108      0     36      0   100%
autogen/agentchat/middleware/llm.py                               143      4     62      4    95%
autogen/agentchat/middleware/message_store.py                     113      5     60      9    92%
autogen/agentchat/middleware/termination.py                       127     22     66     14    79%
autogen/agentchat/middleware/tool_use.py                          143      0     54      0   100%

Next steps:

  1. Utilities for building middleware chain and validating call(...) method signatures.
  2. Update code-level documentation, remove recommendation for subclassing.
  3. Write a tutorial on extending ConversibleAgent using middleware instead of subclassing.

Related issue number

Checks

@codecov-commenter
Copy link

codecov-commenter commented Jan 14, 2024

Codecov Report

Attention: 209 lines in your changes are missing coverage. Please review.

Comparison is base (1ab2354) 32.48% compared to head (bbdc8dd) 49.48%.

Files Patch % Lines
...gen/agentchat/contrib/capabilities/teachability.py 5.08% 56 Missing ⚠️
autogen/agentchat/conversable_agent.py 71.79% 45 Missing and 10 partials ⚠️
autogen/agentchat/middleware/termination.py 71.65% 22 Missing and 14 partials ⚠️
autogen/agentchat/middleware/llm.py 79.72% 24 Missing and 5 partials ⚠️
autogen/agentchat/middleware/message_store.py 79.64% 15 Missing and 8 partials ⚠️
autogen/agentchat/contrib/gpt_assistant_agent.py 0.00% 2 Missing ⚠️
autogen/agentchat/contrib/compressible_agent.py 50.00% 1 Missing ⚠️
autogen/agentchat/contrib/llava_agent.py 50.00% 1 Missing ⚠️
autogen/agentchat/contrib/math_user_proxy_agent.py 75.00% 1 Missing ⚠️
...ogen/agentchat/contrib/retrieve_assistant_agent.py 0.00% 1 Missing ⚠️
... and 4 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1240       +/-   ##
===========================================
+ Coverage   32.48%   49.48%   +17.00%     
===========================================
  Files          41       49        +8     
  Lines        4907     5252      +345     
  Branches     1120     1238      +118     
===========================================
+ Hits         1594     2599     +1005     
+ Misses       3187     2485      -702     
- Partials      126      168       +42     
Flag Coverage Δ
unittests 49.42% <78.25%> (+16.98%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Tylersuard
Copy link
Collaborator

I'm going to be honest, even as a mid-level developer, I can't understand a word of the example code. Aren't we making AutoGen easy for everyone to use, regardless of their skill level? The basic Autogen code is fairly simple, set up a model, instantiate an agent, initiate that agent chat. Where do these middlewares fit into that process?

@Tylersuard
Copy link
Collaborator

Tylersuard commented Jan 28, 2024

Thank you for the valuable feedback! We need to recognize the fact that most autogen users who want to extend agent capabilities will not have strong python backgrounds. This should not prevent us from adopting clean and powerful extension mechanisms, but the machinery needs to be readable, carefully documented, and easy for almost anyone to use, as well as easy to maintain after all of us have moved on. If we require users to become versed in arcane libraries and abstruse programming patterns, those barriers will severely limit autogen's adoption.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 28, 2024

I'm going to be honest, even as a mid-level developer, I can't understand a word of the example code. Aren't we making AutoGen easy for everyone to use, regardless of their skill level? The basic Autogen code is fairly simple, set up a model, instantiate an agent, initiate that agent chat. Where do these middlewares fit into that process?

Middleware is meant for the framework developer, not for application developer. It is not changing the interface of AutoGen, rather it is changing the backend and how the "under-the-hood" stuff is written.

You can read the PR description about the motivation.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 28, 2024

Currently in AutoGen, each incoming message is handled by a pipeline of registered reply functions. Each reply function is triggered by a trigger function. If a reply function is triggered and signaled it is a final reply, it short-circuits the pipeline and returns the generated reply back to the sender.

This design pattern is described in the AutoGen paper (https://arxiv.org/pdf/2308.08155.pdf, Section 2). This PR is a refactor, we convert reply function into middleware class, so it can better handle states like code executors, message history, etc.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 28, 2024

@Tylersuard Apologies if this PR confused you. Shouldn't have routed you here 😄 . If you are interested in contribute new code, don't worry about this PR.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 28, 2024

Closing this PR. We have learned a lot working on this branch. Time to use what we have learned toward a series of PRs that will incrementally move the code base toward extensibility.

@ekzhu ekzhu closed this Jan 28, 2024
@ekzhu ekzhu deleted the hooks branch January 28, 2024 07:27
@davorrunje
Copy link
Collaborator Author

davorrunje commented Jan 28, 2024

@Tylersuard as @ekzhu said, the idea was to refactor the code using a well-known Middleware pattern instead of a currently used ad-hoc mechanism with reply functions. The Middleware pattern was explored here because of its popularity in frameworks such as Starlette (https://www.starlette.io/middleware/) and FastAPI (https://fastapi.tiangolo.com/tutorial/middleware/) where it is the standard way of extending the framework by non-experts. The main idea is to have a self-contained Middleware class that can be independently tested and then integrated with other middleware functions to implement a required combination of functionalities. E.g. if you wish to add observability to your agents using OpenTelemetry, you would only need to implement a Middleware subclass and attach it to an Agent. As I said, that is a well-known and battle-proven mechanism in the most popular frameworks used by millions of junior and mid-level developers. Implementation of the mechanism is not the simplest one, but using it is easier than using the mechanism with reply functions available today.

@tyler-suard-parker
Copy link
Contributor

tyler-suard-parker commented Jan 29, 2024

@davorrunje Thank you for your response, I understand, and I am glad you are interested in simplifying the experience for developers. Could you give some simple example code for a real-world application for the middleware pattern you are proposing? For example, how would I make an agent save its output to a text document instead of printing it to the terminal? How would I modify the below code to incorporate the text-saving middleware? Thank you.


config_list = ***

assistant = autogen.AssistantAgent(***)
user_proxy = autogen.UserProxyAgent(***)

user_proxy.initiate_chat(
    assistant,
    message="""What date is today? Compare the year-to-date gain for META and TESLA.""",
) ```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.