Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Confirm tool calling is not supported and this is the closest thing can be done #7912

Open
summersonnn opened this issue Aug 27, 2024 · 6 comments
Labels
stale usage How to use vllm

Comments

@summersonnn
Copy link

summersonnn commented Aug 27, 2024

Hi.

LLM -> Llama-3.1-8B-Instruct

In the vllm docs, it is said that:

Tool calling in the chat completion API

vLLM supports only named function calling in the chat completion API. The tool_choice options auto and required are not yet supported but on the roadmap.

To use a named function you need to define the function in the tools parameter and call it in the tool_choice parameter.

It is the callers responsibility to prompt the model with the tool information, vLLM will not automatically manipulate the prompt. This may change in the future.

vLLM will use guided decoding to ensure the response matches the tool parameter object defined by the JSON schema in the tools parameter.

Please refer to the OpenAI API reference documentation for more information.

  1. Can we confirm that this still holds? I see bunch of related PRs and good progress, so I'd like to be sure.
  2. Since tool calling without named functions does not work, we can't use libraries/frameworks for Agentic AI such as AutoGen. Correct?

For example, when this code is run (from AutoGen docs):

import os
from autogen import UserProxyAgent, ConversableAgent
from typing import Annotated, Literal

Operator = Literal["+", "-", "*", "/"]

def calculator(a: int, b: int, operator: Annotated[Operator, "operator"]) -> int:
    if operator == "+":
        return a + b
    elif operator == "-":
        return a - b
    elif operator == "*":
        return a * b
    elif operator == "/":
        return int(a / b)
    else:
        raise ValueError("Invalid operator")

# Let's first define the assistant agent that suggests tool calls.
assistant = ConversableAgent(
    name="Assistant",
    system_message="You are a helpful AI assistant. "
    "You can help with simple calculations. "
    "Return 'TERMINATE' when the task is done.",
    llm_config={
        "config_list": [
            {
                "model": "<YOUR MODEL NAME>",
                "api_key": "<APİ_KEY>",
                "base_url": "<BASE_URL_FOR_LOCAL_LLM>"
            }
        ]
    }
)


# The user proxy agent is used for interacting with the assistant agent
# and executes tool calls.
user_proxy = ConversableAgent(
    name="User",
    llm_config=False,
    is_termination_msg=lambda msg: msg.get("content") is not None and "TERMINATE" in msg["content"],
    human_input_mode="NEVER",
)

# Register the tool signature with the assistant agent.
assistant.register_for_llm(name="calculator", description="A simple calculator")(calculator)

# Register the tool function with the user proxy agent.
user_proxy.register_for_execution(name="calculator")(calculator)

chat_result = user_proxy.initiate_chat(assistant, message="What is (44232 + 13312 / (232 - 32)) * 5?")

it is supposed to produce the following which actually comprises executing the function: (I'm showing just a part of it):

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

I apologize for the confusion, I seem to have made a mistake. Let me recalculate the expression properly.

First, we need to do the calculations within the brackets. So, calculating (1423 - 123), (32 + 23), and then performing remaining operations.
***** Suggested tool call (call_mx3M3fNOwikFNoqSojDH1jIr): calculator *****
Arguments: 
{
    "input": {
        "a": 1423,
        "b": 123,
        "operator": "-"
    }
}
***************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_mx3M3fNOwikFNoqSojDH1jIr) *****
1300
**********************************************************************

But when I run it with my local LLM with vllm backend, it does not execute the function, it replies normally instead: (again, just a part of it)


>>>>>>>> USING AUTO REPLY...
Assistant (to User):

<|python_tag|>{"name": "calculator", "parameters": {"a": 43998.56, "b": 5, "operator": "*"}}

--------------------------------------------------------------------------------
User (to Assistant):



--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

{"name": "calculator", "parameters": {"a": 219994, "b": 5, "operator": "*"}}
  1. As you can see, local llm respond starts with "<|python_tag|>" sometimes. Actually most of the times. And this is not about AutoGen. I encountered this behaviour without using any 3rd party framework/lib. And even though I tried my best to hide this token by editing some lines in the config json files (special_tokens etc.), I failed. Any solution to this?

  2. My best attempt to integrate the auto tool calling in vLLM is this:

I added a "default function" in the available tools to llama. It is supposed to call this whenever none of the others is appropriate

    {
        "type": "function",
        "function": {
            "name": "default_function",
            "description": "If none of the other functions is needed, simply call this.",
            "parameters": {
                "type": "object",
                "properties": {
                    "normal_prompt": {
                        "type": "string",
                        "description": "The prompt user has typed.",
                    }
                },
                "required": ["normal_prompt"],
            },
        }
    },

And here is the heart of the code which does what I want. For now, I don't actually call the function but the response is the function call with the full signature. So, only calling is missing. I just wanted to be sure if this is the best we can do with vLLM right now:

def send_request_to_llm(chat_history, use_tools=True):
    extra_body = {
        "stop_token_ids": [128001, 128008, 128009],  # Ensure this is included
        "temperature": 0.2,
        "top_p": 0.95,
    }

    if use_tools:
        extra_body["guided_decoding_backend"] = 'outlines'

    # Prepare the arguments for the streamer call
    streamer_args = {
        "model": "<my_local_model_path>",
        "messages": chat_history,
        "extra_body": extra_body,
        "stream": True
    }

    if use_tools:
        streamer_args["tools"] = TOOLS

    streamer = client.chat.completions.create(**streamer_args)

# In the below code, if use_tools is True that means we are getting a function to call in the end. We can disable streaming.
# Otherwise if the response is a normal response from the model, we want to see streaming text
    assistant_response = ""
    for chunk in streamer:
        delta = chunk.choices[0].delta
        if delta.content:
            for token in chunk.choices[0].delta.content:
                if not use_tools:
                    print(token, end="", flush=True)
                assistant_response += token

    if use_tools:
        if assistant_response.startswith("{"):
            json_object = json.loads(assistant_response)
        elif assistant_response.startswith("<|python_tag|>"):
            json_object = json.loads(assistant_response[14:])
        # Occassionally, even the default function is not called. (A bug?) Handle this way
        else:
            print("-----------------------")
            chat_history.append({"role": "assistant", "content": assistant_response})
            send_request_to_llm(chat_history, use_tools=False)
            print()
            return
        
        # Fetch all parameters
        params = json_object['parameters']

        # Format the parameters into a single string
        formatted_params = ', '.join([f"{key}='{value}'" for key, value in params.items()])

        # This is the function call to make such as multiply(5,6)
        call_this = f"{json_object['name']}({formatted_params})"
        
        # This block only runs whenever none of the tools is needed (so, default func is used)
        if "default_function" in call_this:
            chat_history.append({"role": "assistant", "content": assistant_response})
            send_request_to_llm(chat_history, use_tools=False)
            print()
            return
        else:
           print(assistant_response) 
           print(call_this)
           print()
        
    chat_history.append({"role": "assistant", "content": assistant_response})

Here is an example output. The very last line is the function call to be made after manipulating the model's response:

image

  1. And lastly, for incorporating agentic AI workflows when using vLLM, do I have to write everything from scratch? maybe start from the code above and work my way up? I'd be glad if you can steer me in the right direction.

Many thanks.

@summersonnn summersonnn added the usage How to use vllm label Aug 27, 2024
@Playerrrrr
Copy link

+1

@lucasalvarezlacasa
Copy link

Any updates on this topic?

@summersonnn
Copy link
Author

summersonnn commented Oct 15, 2024

Any updates on this topic?

Supposedly, #8343 solved it. I haven't tried it yet. I'll give it a go in the next release.
I'll switched to Qwen2.5-72B by the way which vllm supports currently: just add --enable-auto-tool-choice --tool-call-parser hermes at the end of vllm serve command.

@lucasalvarezlacasa
Copy link

lucasalvarezlacasa commented Oct 16, 2024

Thank you @summersonnn . I'll give it a try with the latest VLLM version and see if it works or not.

Based on this comment, I understand I will have to use the following flags in the serve command:

--enable-auto-tool-choice --tool-call-parser llama3_json

Is this correct? The llama3_json parser doesn't even appear yet in the official documentation.

@lucasalvarezlacasa
Copy link

lucasalvarezlacasa commented Oct 17, 2024

I can confirm that tool calling works with the previous configuration given and vllm/vllm-openai:v0.6.3.

I will provide an example of the request and the answer given by the model:

Request

{
  "messages": [
    {
      "content": "You are a helpful assistant tasked with performing arithmetic on a set of inputs.",
      "role": "system"
    },
    {
      "content": "Add 3 and 4. Multiply the output by 2. Divide the output by 5",
      "role": "user"
    }
  ],
  "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
  "stream": false,
  "n": 1,
  "temperature": 0.0,
  "max_tokens": 256,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "add",
        "description": "Adds a and b.",
        "parameters": {
          "properties": {
            "a": {
              "description": "first int",
              "type": "integer"
            },
            "b": {
              "description": "second int",
              "type": "integer"
            }
          },
          "required": ["a", "b"],
          "type": "object"
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "multiply",
        "description": "Multiply a and b.",
        "parameters": {
          "properties": {
            "a": {
              "description": "first int",
              "type": "integer"
            },
            "b": {
              "description": "second int",
              "type": "integer"
            }
          },
          "required": ["a", "b"],
          "type": "object"
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "divide",
        "description": "Divide a and b.",
        "parameters": {
          "properties": {
            "a": {
              "description": "first int",
              "type": "integer"
            },
            "b": {
              "description": "second int",
              "type": "integer"
            }
          },
          "required": ["a", "b"],
          "type": "object"
        }
      }
    }
  ],
  "parallel_tool_calls": false
}

Response

{
  "ChatCompletion": {
    "id": "chat-32cb47446c5b471eba5c91be1755811e",
    "choices": [
      {
        "finish_reason": "tool_calls",
        "index": 0,
        "logprobs": null,
        "message": {
          "content": null,
          "refusal": null,
          "role": "assistant",
          "function_call": null,
          "tool_calls": [
            {
              "id": "chatcmpl-tool-f8c832f4a42445f899a229063004cae9",
              "function": {
                "arguments": '{"a": 3, "b": 4}',
                "name": "add"
              },
              "type": "function"
            },
            {
              "id": "chatcmpl-tool-4b44f70f7dde47d0820f8a3b9018b897",
              "function": {
                "arguments": '{"a": 7, "b": 2}',
                "name": "multiply"
              },
              "type": "function"
            },
            {
              "id": "chatcmpl-tool-d897bd7ecb4b44e59eb718aff21cbfa8",
              "function": {
                "arguments": '{"a": 14, "b": 5}',
                "name": "divide"
              },
              "type": "function"
            }
          ]
        },
        "stop_reason": 128008
      }
    ],
    "created": 1729149431,
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "object": "chat.completion",
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
      "completion_tokens": 67,
      "prompt_tokens": 466,
      "total_tokens": 533,
      "completion_tokens_details": null,
      "prompt_tokens_details": null
    },
    "prompt_logprobs": null
  }
}

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale usage How to use vllm
Projects
None yet
Development

No branches or pull requests

3 participants