Add support for Grammar/Tools + TGI-based specs in InferenceClient #2237

Wauplin · 2024-04-19T13:29:19Z

Sorry for the huge PR 😬 It goes hand-in-hand with huggingface/huggingface.js#629 (and to a lesser extent huggingface/text-generation-inference#1798). Most of the changes to review are hopefully documentation / auto-generated stuff.

What's in this PR?

Inference types for text_generation and chat_completion have been updated based on TGI specs (see associated PR Generate specs from TGI openapi.json huggingface.js#629)
- 💔 there are breaking changes for people importing the return types. Otherwise, there shouldn't be breaking changes in the usage of return values.
Add support for grammar in text_generation task
Add support for tools in chat_completion task
Add support more parameters to those 2 tasks. All parameters supported by TGI are not handled.
When a model is not TGI-served, we now parse the error message to check which parameters are not supported. This makes it more robust instead of manually maintaining a list of incompatible parameters. See MODEL_KWARGS_NOT_USED_REGEX.
In text_generation, only send non-None parameters in payload.
Added a script check_inference_input_params.py that checks task parameters are correctly used and documented, to be consistent with the generated types. For now, raises an error when not consistent but doesn't provide an auto-fix (see TODOs in script).
- for now, only for chat_completion/text_generation + do not check everything
Adapted generate_async_inference_client.py to generate KO package reference correctly.
Fixed a bunch of tests, mainly related to new types names

What to review

Mostly:

src/huggingface_hub/inference/_client.py => lots of docs update (not interesting) + few tweaks in code (to review)
src/huggingface_hub/inference/_common.py => only small tweaks
tests/test_inference_client.py => to check "it works"
tests/test_inference_text_generation.py => to check "it works"
utils/check_inference_input_params.py => new script to check input parameters

The rest is either auto-generated stuff or low-level scripts.

Generated docs:

What's not this PR?

chat_completion gives no error for wrong input tupe #2225 is not fixed
logic to switch between TGI/transformers backend is still clunky

=> will be handled in a follow-up PR

HuggingFaceDocBuilderDev · 2024-04-23T10:08:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LysandreJik

Given the quantity of auto-generated code I couldn't do a perfect review, but I haven't seen anything shocking in the PR. I'm of the opinion of "ship it and eventually fix if issues arise", the code looks clean enough :)

Wauplin · 2024-04-25T09:34:19Z

Thanks for the review @LysandreJik! Agree with you about "I'm of the opinion of ship it and eventually fix if issues arise," 😄

@OlivierDehaene

From an idea mentioned by @OlivierDehaene and @drbh in #579 and [slack thread](https://huggingface.slack.com/archives/C05CFK1HM0T/p1711360022125399) (internal). This PR adds a script `inference-tgi-import.ts` to generate the `text-generation` and `chat_completion` specs from the auto-generated TGI [specifications](https://huggingface.github.io/text-generation-inference/). The goal is to keep in sync TGI improvements with @huggingface/tasks and therefore have a consistent "single source of truth". The converted specs are then compatible with our tooling to generate JS/Python code. This PR changes quite a lot of naming in the generated JS/Python types. Luckily I don't think they were yet used in the JS ecosystem so better to do that now than later. I also opened huggingface/huggingface_hub#2237 to include these changes in `huggingface_hub`. **TODO:** - [ ] fix lint errors. How to deal with a parsed json to avoid using `any`? - [ ] CI workflow to open a PR each time TGI is updated? => can be done in a future PR --------- Co-authored-by: SBrandeis <[email protected]>

Wauplin added 5 commits April 19, 2024 15:28

TGI-based specs + adapt

cff4ef8

deprecate conversational + docs

7633536

docs

096f759

support grammar use case

3a1c767

grammar example

f0d0864

Wauplin mentioned this pull request Apr 22, 2024

Generate specs from TGI openapi.json huggingface/huggingface.js#629

Merged

2 tasks

Wauplin added 13 commits April 22, 2024 15:46

Merge branch 'main' into follow-tgi-specs-for-chat-completion

ae03117

fix

5897283

add script to check inputs

96c947a

add test for chat completion with tool

14cb6a1

Merge branch 'main' intto follow-tgi-specs-for-chat-completion

e3b4d42

Merge branch 'main' into follow-tgi-specs-for-chat-completion

bde6b1c

dummy

fa20bd3

fix test

d06841c

document all params

4a1877e

todo

cfbf52a

add type check to ci

5694861

fix tests

493dd28

fix KO docs

78345e3

more secure

086a8f2

Wauplin marked this pull request as ready for review April 23, 2024 10:23

Wauplin added 6 commits April 23, 2024 12:41

examples in docstring

7b319ed

more examples

2aad578

fix docs

a3c78b0

fix test

04af89e

docs

d6d09a3

autogen doc

6a66bd8

Wauplin requested a review from LysandreJik April 23, 2024 13:56

Wauplin changed the title ~~[wip] TGI-based specs + adapt~~ TGI-based specs + adapt code + add support for Gramma/Tools Apr 23, 2024

Wauplin requested review from OlivierDehaene and SBrandeis April 23, 2024 13:56

re-add stream

e08f976

Wauplin changed the title ~~TGI-based specs + adapt code + add support for Gramma/Tools~~ Add support for Grammar/Tools + TGI-based specs in InferenceClient Apr 23, 2024

LysandreJik approved these changes Apr 23, 2024

View reviewed changes

Wauplin merged commit ff6c1f9 into main Apr 25, 2024
16 checks passed

Wauplin deleted the follow-tgi-specs-for-chat-completion branch April 25, 2024 09:34

Wauplin mentioned this pull request May 3, 2024

Errors with Conversational Endpoint huggingface/huggingface.js#488

Closed

This was referenced Jun 5, 2024

Fix always None in text_generation output #2316

Merged

details is always None with text_generation #2315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Grammar/Tools + TGI-based specs in InferenceClient #2237

Add support for Grammar/Tools + TGI-based specs in InferenceClient #2237

Wauplin commented Apr 19, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 23, 2024

LysandreJik left a comment

Wauplin commented Apr 25, 2024

Add support for Grammar/Tools + TGI-based specs in InferenceClient #2237

Add support for Grammar/Tools + TGI-based specs in InferenceClient #2237

Conversation

Wauplin commented Apr 19, 2024 • edited Loading

What's in this PR?

What to review

Generated docs:

What's not this PR?

HuggingFaceDocBuilderDev commented Apr 23, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

Wauplin commented Apr 25, 2024

Wauplin commented Apr 19, 2024 •

edited

Loading