-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huggingface optimum prepackaged server #4081
Huggingface optimum prepackaged server #4081
Conversation
/test integration |
1 similar comment
/test integration |
/test integration |
It seems failed test is /test integration |
/test integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I think it should be ready to go @axsaucedo 👍
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adriangonz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Which issue(s) this PR fixes:
Fixes #4082
Introduces MLServer Runtime: SeldonIO/MLServer#573
HuggingFace Server
Thanks to our collaboration with the HuggingFace team you can now easily deploy your models from the HuggingFace Hub with Seldon Core.
We also support the high performance optimizations provided by the Transformer Optimum framework.
Pipeline parameters
The parameters available:
task
pretrained_model
pretrained_tokenizer
optimum_model
Simple Example
You can deploy a HuggingFace model by providing parameters to your pipeline.
Quantized & Optimized Models with Optimum
You can deploy a HuggingFace model loaded using the Optimum library by using the
optimum_model
parameter