Huggingface optimum prepackaged server #4081

axsaucedo · 2022-05-07T17:53:22Z

Which issue(s) this PR fixes:

Introduces MLServer Runtime: SeldonIO/MLServer#573

HuggingFace Server

Thanks to our collaboration with the HuggingFace team you can now easily deploy your models from the HuggingFace Hub with Seldon Core.

We also support the high performance optimizations provided by the Transformer Optimum framework.

Pipeline parameters

The parameters available:

Name	Description
`task`	The transformer pipeline task
`pretrained_model`	The name of the pretrained model in the Hub
`pretrained_tokenizer`	Transformer name in Hub if different to the one provided with model
`optimum_model`	Boolean to enable loading model with Optimum framework

Simple Example

You can deploy a HuggingFace model by providing parameters to your pipeline.

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: gpt2-model
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      parameters:
      - name: task
        type: STRING
        value: text-generation
      - name: pretrained_model
        type: STRING
        value: distilgpt2
    name: default
    replicas: 1

Quantized & Optimized Models with Optimum

You can deploy a HuggingFace model loaded using the Optimum library by using the optimum_model parameter

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: gpt2-model
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      parameters:
      - name: task
        type: STRING
        value: text-generation
      - name: pretrained_model
        type: STRING
        value: distilgpt2
      - name: optimum_model
        type: BOOL
        value: true
    name: default
    replicas: 1

axsaucedo · 2022-05-09T12:57:57Z

/test integration

axsaucedo · 2022-05-09T17:04:25Z

/test integration

axsaucedo · 2022-05-10T06:57:48Z

/test integration

operator/controllers/seldondeployment_prepackaged_servers.go

axsaucedo · 2022-05-10T09:25:56Z

It seems failed test is test_label_update which is known to be flaky

/test integration

axsaucedo · 2022-05-10T13:24:37Z

/test integration

axsaucedo · 2022-05-10T15:51:46Z

All unit integration tests pass - should be good to merge

adriangonz

This looks great! I think it should be ready to go @axsaucedo 👍

seldondev · 2022-05-11T08:31:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adriangonz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [adriangonz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

axsaucedo added 2 commits May 7, 2022 17:58

Added huggingface prepack server with default params

5eb9b66

Deploying with optimum models

027bee0

seldondev added do-not-merge/work-in-progress size/L labels May 7, 2022

axsaucedo changed the title ~~WIP: Huggingface prepackaged server with optimum optimization~~ WIP: Huggingface optimum prepackaged server May 7, 2022

axsaucedo changed the title ~~WIP: Huggingface optimum prepackaged server~~ Huggingface optimum prepackaged server May 9, 2022

seldondev removed the do-not-merge/work-in-progress label May 9, 2022

axsaucedo requested review from ukclivecox and adriangonz May 9, 2022 08:59

Added distinction between override and default env vars

6f0109a

adriangonz reviewed May 10, 2022

View reviewed changes

operator/controllers/seldondeployment_prepackaged_servers.go Outdated Show resolved Hide resolved

Refactoring implementation to inside mlserver.go

4163064

axsaucedo requested a review from adriangonz May 10, 2022 13:24

adriangonz approved these changes May 11, 2022

View reviewed changes

seldondev added the approved label May 11, 2022

seldondev merged commit 3910a89 into SeldonIO:master May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huggingface optimum prepackaged server #4081

Huggingface optimum prepackaged server #4081

axsaucedo commented May 7, 2022 •

edited

Loading

axsaucedo commented May 9, 2022

axsaucedo commented May 9, 2022

axsaucedo commented May 10, 2022

axsaucedo commented May 10, 2022

axsaucedo commented May 10, 2022

axsaucedo commented May 10, 2022

adriangonz left a comment

seldondev commented May 11, 2022

Huggingface optimum prepackaged server #4081

Huggingface optimum prepackaged server #4081

Conversation

axsaucedo commented May 7, 2022 • edited Loading

HuggingFace Server

Pipeline parameters

Simple Example

Quantized & Optimized Models with Optimum

axsaucedo commented May 9, 2022

axsaucedo commented May 9, 2022

axsaucedo commented May 10, 2022

axsaucedo commented May 10, 2022

axsaucedo commented May 10, 2022

axsaucedo commented May 10, 2022

adriangonz left a comment

Choose a reason for hiding this comment

seldondev commented May 11, 2022

axsaucedo commented May 7, 2022 •

edited

Loading