[Frontend] Tool calling parser for Granite 3.0 models #9027

maxdebayser · 2024-10-02T19:54:12Z

This PR adds a tool calling parser for ibm-granite/granite-3.0-8b-instruct. The smaller models in
the Granite 3.0 Language Models models collection pass some of the tests..

cc: @njhill

github-actions · 2024-10-02T19:54:23Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

njhill

Thanks @maxdebayser!

docs/source/serving/openai_compatible_server.md

This model supports all the cases in our unit tests I had to rebase this due to DCO problems in several commits that have now been merged in main. Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser · 2024-10-31T21:46:25Z

This is ready for review.

cc: @njhill @wseaton

njhill

Thanks @maxdebayser! I did a first pass and left some comments

vllm/entrypoints/openai/tool_parsers/granite_tool_parser.py

njhill · 2024-11-01T22:16:15Z

cc @K-Mistele in case you'd like to take a look too

Signed-off-by: Max de Bayser <[email protected]>

K-Mistele · 2024-11-05T06:34:34Z

will take another look!

njhill

Thanks again @maxdebayser. I'll hold off for any comments from @K-Mistele before merging.

K-Mistele · 2024-11-07T04:49:54Z

Giving it a final once-over :)

Out of curiosity it looks like the model's default context is 4096; is there any way to scale this if I want to test longer-context tool calls? no worries if there's not a good way to do with with vLLM

K-Mistele · 2024-11-07T05:11:54Z

Looks great to me! Seems pretty robust, and passing tests on my machine :)

K-Mistele · 2024-11-07T05:14:00Z

Looks like the only failing tests were AMD-related which was expected last I checked. cc @mgoin @DarkLight1337

DarkLight1337 · 2024-11-07T05:40:08Z

Looks like the only failing tests were AMD-related which was expected last I checked. cc @mgoin @DarkLight1337

The test is not failing on main branch, so I think it's introduced by this PR. Do tell me if I'm mistaken though.

K-Mistele · 2024-11-07T06:11:06Z

Looks like the only failing tests were AMD-related which was expected last I checked. cc @mgoin @DarkLight1337

The test is not failing on main branch, so I think it's introduced by this PR. Do tell me if I'm mistaken though.

Oops, missed that. I just remembered the last time I did one there were some expected to fail, but I'm sure you're right :)

Seems related to the FP8 granite 20B quantization (mbayser/granite-20b-functioncalling-FP8-KV) that's being used:

https://buildkite.com/vllm/ci-aws/builds/10840#01930511-9576-45cb-aebb-e237d5f07c9b/974-5501

AMD supports FP8 in ROCM > 6.2. Could also be an OOM issue since this is a 20B that we had issues fitting into other CI GPUs; or could be related to CPU offloading.

Whatever the case, seems more like a model+configuration+hardware compatibility issue rather than a code issue. Is there any way to disable this particular model for AMD tests, and would that be an acceptable solution?

DarkLight1337 · 2024-11-07T06:14:19Z

I think this is alright. @njhill any thoughts?

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser · 2024-11-07T09:43:17Z

I've pushed a change to disable skip the granite-20b-functioncalling test on AMD. Let's see if it passes now 🤞

maxdebayser · 2024-11-07T09:45:01Z

Out of curiosity it looks like the model's default context is 4096; is there any way to scale this if I want to test longer-context tool calls? no worries if there's not a good way to do with with vLLM

I'm not sure, but as we add more models and tests, we might have to get the max sequence length for each model and skip the tests that require longer ones.

Signed-off-by: Max de Bayser <[email protected]>

njhill · 2024-11-07T15:08:51Z

Thanks again @maxdebayser @K-Mistele @DarkLight1337!

) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Isotr0py <[email protected]>

) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: OmerD <[email protected]>

) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Loc Huynh <[email protected]>

) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

njhill changed the title ~~[Frontend] Tool calling parser for granite-8b-instruct~~ [Frontend] Tool calling parser for Granite 3.0 models Oct 24, 2024

njhill reviewed Oct 24, 2024

View reviewed changes

maxdebayser mentioned this pull request Oct 28, 2024

[Model] tool calling support for ibm-granite/granite-20b-functioncalling #8339

Merged

mergify bot added documentation Improvements or additions to documentation frontend ci/build labels Oct 30, 2024

Add tool calling support for granite-3.0-8b-instruct

51e2d3a

This model supports all the cases in our unit tests I had to rebase this due to DCO problems in several commits that have now been merged in main. Signed-off-by: Max de Bayser <[email protected]>

maxdebayser force-pushed the granite-8b-instruct branch from 77ee365 to 51e2d3a Compare October 31, 2024 18:50

maxdebayser marked this pull request as ready for review October 31, 2024 18:50

maxdebayser added 2 commits October 31, 2024 17:05

lower max model len for tool calling tests

7436870

Signed-off-by: Max de Bayser <[email protected]>

revert forcing fast check

517ee5a

Signed-off-by: Max de Bayser <[email protected]>

njhill reviewed Nov 1, 2024

View reviewed changes

maxdebayser added 4 commits November 4, 2024 10:06

add review suggestions

f136053

Signed-off-by: Max de Bayser <[email protected]>

add review suggestions

5e48a8a

Signed-off-by: Max de Bayser <[email protected]>

add review suggestions

74b7438

Signed-off-by: Max de Bayser <[email protected]>

fix linting

4cbcd44

Signed-off-by: Max de Bayser <[email protected]>

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2024

njhill approved these changes Nov 6, 2024

View reviewed changes

maxdebayser added 2 commits November 7, 2024 06:10

Merge branch 'upstream_main' into granite-8b-instruct-2

a6aa3b8

skip quantized granite-20b test on ROCm

4679418

Signed-off-by: Max de Bayser <[email protected]>

fix copy&paste error :facepalm

95615ac

Signed-off-by: Max de Bayser <[email protected]>

njhill merged commit ae62fd1 into vllm-project:main Nov 7, 2024
47 checks passed

Isotr0py pushed a commit to Isotr0py/vllm that referenced this pull request Nov 8, 2024

[Frontend] Tool calling parser for Granite 3.0 models (vllm-project#9027

e9273e1

) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Isotr0py <[email protected]>

omer-dayan pushed a commit to omer-dayan/vllm that referenced this pull request Nov 10, 2024

[Frontend] Tool calling parser for Granite 3.0 models (vllm-project#9027

581ae28

) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: OmerD <[email protected]>

JC1DA pushed a commit to JC1DA/vllm that referenced this pull request Nov 11, 2024

[Frontend] Tool calling parser for Granite 3.0 models (vllm-project#9027

5a2b152

) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Loc Huynh <[email protected]>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[Frontend] Tool calling parser for Granite 3.0 models (vllm-project#9027

e770232

) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

K-Mistele mentioned this pull request Nov 17, 2024

[Bug]: Granite 3.0 disconnect between parser and example template #10379

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Tool calling parser for Granite 3.0 models #9027

[Frontend] Tool calling parser for Granite 3.0 models #9027

maxdebayser commented Oct 2, 2024 •

edited

Loading

github-actions bot commented Oct 2, 2024

njhill left a comment

maxdebayser commented Oct 31, 2024

njhill left a comment

njhill commented Nov 1, 2024

K-Mistele commented Nov 5, 2024

njhill left a comment

K-Mistele commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

DarkLight1337 commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

DarkLight1337 commented Nov 7, 2024

maxdebayser commented Nov 7, 2024

maxdebayser commented Nov 7, 2024

njhill commented Nov 7, 2024

[Frontend] Tool calling parser for Granite 3.0 models #9027

[Frontend] Tool calling parser for Granite 3.0 models #9027

Conversation

maxdebayser commented Oct 2, 2024 • edited Loading

github-actions bot commented Oct 2, 2024

njhill left a comment

Choose a reason for hiding this comment

maxdebayser commented Oct 31, 2024

njhill left a comment

Choose a reason for hiding this comment

njhill commented Nov 1, 2024

K-Mistele commented Nov 5, 2024

njhill left a comment

Choose a reason for hiding this comment

K-Mistele commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

DarkLight1337 commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

DarkLight1337 commented Nov 7, 2024

maxdebayser commented Nov 7, 2024

maxdebayser commented Nov 7, 2024

njhill commented Nov 7, 2024

maxdebayser commented Oct 2, 2024 •

edited

Loading