Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Grammar/Tools + TGI-based specs in InferenceClient #2237

Merged
merged 26 commits into from
Apr 25, 2024

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Apr 19, 2024

Sorry for the huge PR 😬 It goes hand-in-hand with huggingface/huggingface.js#629 (and to a lesser extent huggingface/text-generation-inference#1798). Most of the changes to review are hopefully documentation / auto-generated stuff.

What's in this PR?

  • Inference types for text_generation and chat_completion have been updated based on TGI specs (see associated PR Generate specs from TGI openapi.json huggingface.js#629)
    • 💔 there are breaking changes for people importing the return types. Otherwise, there shouldn't be breaking changes in the usage of return values.
  • Add support for grammar in text_generation task
  • Add support for tools in chat_completion task
  • Add support more parameters to those 2 tasks. All parameters supported by TGI are not handled.
  • When a model is not TGI-served, we now parse the error message to check which parameters are not supported. This makes it more robust instead of manually maintaining a list of incompatible parameters. See MODEL_KWARGS_NOT_USED_REGEX.
  • In text_generation, only send non-None parameters in payload.
  • Added a script check_inference_input_params.py that checks task parameters are correctly used and documented, to be consistent with the generated types. For now, raises an error when not consistent but doesn't provide an auto-fix (see TODOs in script).
    • for now, only for chat_completion/text_generation + do not check everything
  • Adapted generate_async_inference_client.py to generate KO package reference correctly.
  • Fixed a bunch of tests, mainly related to new types names

What to review

Mostly:

  • src/huggingface_hub/inference/_client.py => lots of docs update (not interesting) + few tweaks in code (to review)
  • src/huggingface_hub/inference/_common.py => only small tweaks
  • tests/test_inference_client.py => to check "it works"
  • tests/test_inference_text_generation.py => to check "it works"
  • utils/check_inference_input_params.py => new script to check input parameters

The rest is either auto-generated stuff or low-level scripts.

Generated docs:

What's not this PR?

=> will be handled in a follow-up PR

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Wauplin Wauplin marked this pull request as ready for review April 23, 2024 10:23
@Wauplin Wauplin changed the title [wip] TGI-based specs + adapt TGI-based specs + adapt code + add support for Gramma/Tools Apr 23, 2024
@Wauplin Wauplin changed the title TGI-based specs + adapt code + add support for Gramma/Tools Add support for Grammar/Tools + TGI-based specs in InferenceClient Apr 23, 2024
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the quantity of auto-generated code I couldn't do a perfect review, but I haven't seen anything shocking in the PR. I'm of the opinion of "ship it and eventually fix if issues arise", the code looks clean enough :)

@Wauplin
Copy link
Contributor Author

Wauplin commented Apr 25, 2024

Thanks for the review @LysandreJik! Agree with you about "I'm of the opinion of ship it and eventually fix if issues arise," 😄

@Wauplin Wauplin merged commit ff6c1f9 into main Apr 25, 2024
16 checks passed
@Wauplin Wauplin deleted the follow-tgi-specs-for-chat-completion branch April 25, 2024 09:34
Wauplin added a commit to huggingface/huggingface.js that referenced this pull request Apr 30, 2024
From an idea mentioned by @OlivierDehaene and @drbh in
#579 and [slack
thread](https://huggingface.slack.com/archives/C05CFK1HM0T/p1711360022125399)
(internal).

This PR adds a script `inference-tgi-import.ts` to generate the
`text-generation` and `chat_completion` specs from the auto-generated
TGI
[specifications](https://huggingface.github.io/text-generation-inference/).
The goal is to keep in sync TGI improvements with @huggingface/tasks and
therefore have a consistent "single source of truth". The converted
specs are then compatible with our tooling to generate JS/Python code.

This PR changes quite a lot of naming in the generated JS/Python types.
Luckily I don't think they were yet used in the JS ecosystem so better
to do that now than later.

I also opened huggingface/huggingface_hub#2237
to include these changes in `huggingface_hub`.


**TODO:**
- [ ] fix lint errors. How to deal with a parsed json to avoid using
`any`?
- [ ] CI workflow to open a PR each time TGI is updated? => can be done
in a future PR

---------

Co-authored-by: SBrandeis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants