[Tool parsing] Improve / correct mistral tool parsing #10333

patrickvonplaten · 2024-11-14T15:39:26Z

This PR is heavily inspired / copied from what @gcalmettes nicely summarized here: #9059 (comment) and in following messages. Thanks a ton for the nice investigation and great ideas of how to improve Mistral function calling.

Based on @gcalmettes's idea here #9059 (comment) both tekken models (mistral-nemo) and spm models (mistral-8b) output the [TOOL_CALLS] token so that it can be consumed by the tool parser and hence allow for more robust function calling parsing, e.g.:

vllm serve mistralai/Ministral-8B-Instruct-2410 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice

and then ping the model e.g. via:

import requests
import json

url = 'http://<your-node>:8000/v1/chat/completions'
headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer token'
}

model = "mistralai/Ministral-8B-Instruct-2410"

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city to find the weather for, e.g. 'San Francisco'"
                    },
                    "state": {
                        "type": "string",
                        "description": "The state abbreviation, e.g. 'CA' for California"
                    },
                    "unit": {
                        "type": "string",
                        "description": "The unit for temperature",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["city", "state", "unit"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "rewrite",
            "description": "Rewrite a given text for improved clarity",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "The input text to rewrite"
                    }
                }
            }
        }
    }
]

messages = [
    {"role": "system", "content": "You are an assistant."},
    {
        "role": "user",
        "content": "Could you please rewrite the below article?\n\nMy English needs improvving, maybe I make erors."
    },
    {
        "role": "assistant",
        "content": "",
        "tool_calls": [
            {
                "id": "bbc5b7ede",
                "type": "function",
                "function": {
                    "name": "rewrite",
                    "arguments": '{"text": "My English needs improvving, maybe I make erors."}'
                }
            }
        ]
    },
    {
        "role": "tool",
        "content": '{"action":"rewrite","outcome":"My English needs improving, maybe I make errors."}',
        "tool_call_id": "bbc5b7ede",
        "name": "rewrite"
    },
    {
        "role": "assistant",
        "content": "---\n\nMy English needs improving, maybe I make errors."
    },
    {
        "role": "user",
        "content": "Can you tell me what the temperature will be in Dallas, in Fahrenheit?"
    },
]

data = {
    "model": model,
    "messages": messages,
    "tools": tools
}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json())

This PR should then also finally close: #9059.

github-actions · 2024-11-14T15:39:41Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

patrickvonplaten · 2024-11-14T19:31:15Z

tests/models/decoder_only/language/test_mistral.py

@@ -58,17 +61,62 @@
            },
            "required": ["city", "state", "unit"]
        }
+    },


Make test much more difficult, complex to show the community to what extent function calling can be used with Mistral models

patrickvonplaten · 2024-11-14T19:31:30Z

tests/models/decoder_only/language/test_mistral.py

+
+        model_output = outputs[0].outputs[0].text.strip()
+        assert model_output.startswith(tool_parser.bot_token), model_output
+        parsed_message = tool_parser.extract_tool_calls(model_output, None)


Cleaner to let the parser take care of correctly extracting the dict

patrickvonplaten · 2024-11-14T19:32:09Z

vllm/entrypoints/openai/serving_chat.py

-                                break
-                        request.messages[i][
-                            "tool_calls"] = validated_tool_calls
+                maybe_serialize_tool_calls(request)


moving this out of serving_chat.py just to clean the method a bit. This is a very general method and the error correction here is very mistral specific, so probably better placed in tokenizers.mistral.py

Good point!

I had originally thought about putting it directly in the Mistral Tokenizer but did not in the end because the same problem would occur for any other futur models having a tokenizer not relying on jinja chat templates (none right now, so this was highly hypothetical).
Factoring the logic in the function like you did is a good solution that would still work with other non-chat-template models 👍

patrickvonplaten · 2024-11-14T19:33:32Z

vllm/transformers_utils/tokenizers/mistral.py

+
+            request.messages[i]["tool_calls"] = validated_tool_calls
+
+
 def list_local_repo_files(repo_id: str, revision: Optional[str]) -> List[str]:


As proposed by @gcalmettes here: #9059 (comment)

We don't parse away the [TOOL_CALLS] token for neither tekken nor spm so that function calls can be correctly parsed.

ywang96

Thanks for making this PR! I think it's a lot cleaner now.

gcalmettes · 2024-11-14T22:01:04Z

vllm/entrypoints/openai/serving_chat.py

-                                break
-                        request.messages[i][
-                            "tool_calls"] = validated_tool_calls
+                maybe_serialize_tool_calls(request)


Good point!

I had originally thought about putting it directly in the Mistral Tokenizer but did not in the end because the same problem would occur for any other futur models having a tokenizer not relying on jinja chat templates (none right now, so this was highly hypothetical).
Factoring the logic in the function like you did is a good solution that would still work with other non-chat-template models 👍

gcalmettes · 2024-11-14T22:26:16Z

vllm/transformers_utils/tokenizers/mistral.py

@@ -222,7 +260,8 @@ def convert_tokens_to_string(self, tokens: List[str]) -> str:
        if self.is_tekken:
            tokens = [
                t for t in tokens
-                if t not in self.tokenizer._all_special_tokens
+                if (t is SpecialTokens.tool_calls


Note that after further testing on my end, I found a edge case where not skipping the [TOOL_CALLS] token here can potentially mess up the intended output:

when requiring structured output by specifying response_format=json_object or response_format=json_schema, the [TOOL_CALL] token is still emitted in some cases even though we are not providing any tools to the model, and therefore the generated output is no more compliant with json. I have tested and observed this with all the vllm supported structured output backends (lm-format-enforcer / outlines). Note that this only happens if there is no mention that we expect JSON responses from the model in the system prompt.

If we can find a way to not filter out the SpecialTokens.tool_calls token only when function calling is required (based on the presence of tools in the request for example), that would be best. However I haven't found a clean way yet to pass this information to the convert_tokens_to_string method without having to change the signature of the method ...

I have an easy reproducible example of this problem that I can share to you.

Thanks for the note! Would be great if you could share an easy repro

@patrickvonplaten please find below a scenario were it will break (and further below the small change in prompt that would make the code work, because of added guidance to the model). Note that the code requires lm-format_enforcer version 0.10.9 so it is compatible with the MistralTokenizer.

However, after further investigation, I know now how to fix it (I'm preparing a PR, I'll tag you for your review) ! In fact the problem was present before but "masked" by the fact that the [TOOL_CALL] was skipped in the convert_tokens_to_string method, so your PR made possible to expose the problem 😉 . (the root cause is that all the structured output librairies filter out the special tokens to build their tree of possible tokens, e.g.: this check in lm-format-enforcer but the current vllm MistralTokenizer does not correctly populate the methods that the librairies use for that. The fix is easy, and I have tested it with success.)

""" vllm server started with the following arguments: --guided-decoding-backend=lm-format-enforcer --enable-auto-tool-choice --tool-call-parser=mistral --tokenizer-mode=mistral """ from openai import OpenAI from pydantic import BaseModel client = OpenAI( base_url="http://localhost:8000/v1", api_key="none", ) class CalendarEvent(BaseModel): name: str date: str participants: list[str] completion = client.beta.chat.completions.parse( model="mistralai/Pixtral-12B-2409", messages=[ {"role": "system", "content": "Extract the event information."}, {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."}, ], response_format=CalendarEvent, ) # the response will break as `[TOOL_CALLS]` is present at the beginning of the response event = completion.choices[0].message.parsed print(event.__dict__)

Guiding the model to output JSON by changing the system prompt as below is enough so that the model actually does not produce a tool_call token :

{"role": "system", "content": "Extract the event information. Respond as JSON."},

WIP

dd2df17

mergify bot added the frontend label Nov 14, 2024

patrickvonplaten added 5 commits November 14, 2024 16:40

Merge branch 'vllm-project:main' into improve_vllm_tool_parsing

7a72ccc

WIP

0b551ba

Up

800b376

Up

fe39e84

WIP

f1d3cf2

patrickvonplaten requested review from DarkLight1337 and ywang96 as code owners November 14, 2024 19:29

up

ab8c7e2

patrickvonplaten commented Nov 14, 2024

View reviewed changes

patrickvonplaten added 4 commits November 14, 2024 20:34

up

5cbbff1

Up

b694ba5

up

fac07af

Up

7e9ae4c

ywang96 approved these changes Nov 14, 2024

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 14, 2024

gcalmettes reviewed Nov 14, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 15, 2024 00:25

DarkLight1337 merged commit 11cd1ae into vllm-project:main Nov 15, 2024
62 checks passed

gcalmettes mentioned this pull request Nov 15, 2024

[Bugfix] Ensure special tokens are properly filtered out for guided structured output with MistralTokenizer #10363

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tool parsing] Improve / correct mistral tool parsing #10333

[Tool parsing] Improve / correct mistral tool parsing #10333

patrickvonplaten commented Nov 14, 2024 •

edited

Loading

github-actions bot commented Nov 14, 2024

patrickvonplaten Nov 14, 2024

patrickvonplaten Nov 14, 2024

patrickvonplaten Nov 14, 2024

gcalmettes Nov 14, 2024 •

edited

Loading

patrickvonplaten Nov 14, 2024

ywang96 left a comment

gcalmettes Nov 14, 2024 •

edited

Loading

gcalmettes Nov 14, 2024

patrickvonplaten Nov 14, 2024

gcalmettes Nov 15, 2024 •

edited

Loading


		request.messages[i]["tool_calls"] = validated_tool_calls


		def list_local_repo_files(repo_id: str, revision: Optional[str]) -> List[str]:

[Tool parsing] Improve / correct mistral tool parsing #10333

[Tool parsing] Improve / correct mistral tool parsing #10333

Conversation

patrickvonplaten commented Nov 14, 2024 • edited Loading

github-actions bot commented Nov 14, 2024

patrickvonplaten Nov 14, 2024

Choose a reason for hiding this comment

patrickvonplaten Nov 14, 2024

Choose a reason for hiding this comment

patrickvonplaten Nov 14, 2024

Choose a reason for hiding this comment

gcalmettes Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

patrickvonplaten Nov 14, 2024

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

gcalmettes Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

gcalmettes Nov 14, 2024

Choose a reason for hiding this comment

patrickvonplaten Nov 14, 2024

Choose a reason for hiding this comment

gcalmettes Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

patrickvonplaten commented Nov 14, 2024 •

edited

Loading

gcalmettes Nov 14, 2024 •

edited

Loading

gcalmettes Nov 14, 2024 •

edited

Loading

gcalmettes Nov 15, 2024 •

edited

Loading