Add backends: ONNX & OpenVINO + ONNX optimization, quantization #2712

helena-intel · 2024-06-04T14:44:48Z

Add OpenVINO support for SentenceTransformer models.

Add backend="openvino" to use OpenVINO. OpenVINO models can be loaded directly, or converted on the fly from PyTorch models on the Hugging Face hub.
Use an OpenVINO config with model_kwargs={"ov_config": config} where config can either be a dictionary or a path to a .json file
Use Intel iGPU or dGPU for inference with model_kwargs={"device": "GPU"}. (The device argument for SentenceTransformer expects a PyTorch device. It would require more code modifications with if backend checks to support using the device argument directly to enable Intel GPU. If that is preferred I'm happy to add that)

Documentation is to be done. Should I add an .rst file to docs/sentence_transformer/usage ? Here is basic documentation on how to use the OpenVINO backend, and an example of how to quantize a sentence-transformers model with NNCF and use that with sentence-transformers and the OpenVINO backend: https://gist.github.com/helena-intel/fe7ea16bc015a3d581f3a7417a35a87e

Limitations:

T5 models are not yet supported. optimum-intel plans to refactor seq2seq models, T5 models can be added once this refactoring is done
This PR only supports SentenceTransformer. CrossEncoder support could be added in a new PR.

michaelfeil · 2024-06-09T01:25:28Z

@helena-intel

Thanks! I am not really a reviewer, just saw this PR by chance.

Two concerns:

OVModelForFeatureExtraction -> Doesn't this require a ONNX model, or a re-exported model?
How good would the abstractions you introduced hold for other providers (plain Onnx / the AWS neuron stuff / other impls?)
doesnt openvino ship with optium-intel? Or at least via pip install optium-intel[openvino] or similar?

helena-intel · 2024-06-11T21:15:26Z

@michaelfeil Thanks for your comments!

OVModelForFeatureExtraction -> Doesn't this require a ONNX model, or a re-exported model?

No, it supports both PyTorch models and OpenVINO IR models. If a path to a PyTorch model is provided, it will be converted to OpenVINO IR on the fly.

How good would the abstractions you introduced hold for other providers (plain Onnx / the AWS neuron stuff / other impls?)

I added a backend parameter instead of hardcoding to OpenVINO to make it easy to add other backends too. It should be easy for all Optimum backends. There are some specifics to OpenVINO (e.g. specific configuration settings, supporting exporting on the fly) so the _load_openvino_model() method is specific for that, but the principle of loading models with Optimum is the same for all backends.

I'm also open to suggestions for a different implementation!

doesnt openvino ship with optium-intel? Or at least via pip install optium-intel[openvino] or similar?

Yes, pip install optimum[openvino] and pip install optimum-intel[openvino] install optimum-intel and all recommended dependencies for running OpenVINO models, including NNCF for model quantization and openvino-tokenizers. For running the test I added just OpenVINO is enough.

Also update push_to_hub

Incompatible with OpenVINO imports; still works independently

tomaarsen · 2024-09-30T13:44:17Z

Hello @helena-intel!

Apologies for the radio silence so far. In truth, I've been quietly experimenting with your work and expanding it a bit further. Some of my changes:

Add ONNX backend with the same signature as OpenVINO
Improve remote model support (e.g. previously we checked export just based on whether "openvino_model.xml" existed locally, now also if it exists remotely)
Add create_pr to push_to_hub, making it easier to make pull requests to existing models to add OpenVINO/ONNX models.
Implore users to model.save_pretrained and model.push_to_hub to prevent having re-export the models. I think this should help increase the number of OpenVINO models on the Hub.
Add helper function to optimize ONNX models via Optimum. I'm open to more optimization helper functions for OpenVINO as well.

Feel free to let me know what you think.
P.s. I'm still ironing out the last test failures, and I still have to incorporate this all in some documentation.

Tom Aarsen

Relies on the upcoming optimum and optimum-intel versions; this is expected to fail until then.

helena-intel · 2024-10-14T10:44:18Z

Thanks so much for all your work on this @tomaarsen ! I was on vacation for the past weeks and missed a notification. I am really excited to see this!

tomaarsen · 2024-10-15T12:37:44Z

Gladly! I'm looking forward to seeing the community adopt the new backends more, I think they'll be very valuable.

Tom Aarsen

helena-intel added 2 commits August 24, 2024 08:29

Add OpenVINO support

4b31bfa

Fix OpenVINO test on Windows

4c26bad

helena-intel force-pushed the helena/openvino-support branch from c8c8906 to 4c26bad Compare August 24, 2024 08:50

tomaarsen added 7 commits September 30, 2024 14:09

Expand OpenVino support (remote models); add ONNX backend; tests

581ee19

Also update push_to_hub

Move OV test to test_backends

1a9b1c0

Merge branch 'master' into pr-2712

f91747f

Update push_to_hub test monkeypatching

f03deff

Remove some dead code

1e9c3fa

Skip multi-process tests for now

e55dd4a

Incompatible with OpenVINO imports; still works independently

Move export_optimized_onnx_model to backend.py

6b4b519

tomaarsen added 4 commits September 30, 2024 15:50

Update __init__ to address the export_optimized_onnx_model move

f9d8f9d

Remove dot in commit message

bcd5dd7

Add PR description for export_optimized_onnx_model

02e8e27

OpenVINO will override export=False; update tests

3aa86e3

This was referenced Sep 30, 2024

Is it possible to Quantize Sentence Transformer models? #2968

Closed

Allow for OpenVINO models to be stored in a model subdirectory huggingface/optimum-intel#923

Closed

tomaarsen added 9 commits October 8, 2024 15:05

Add dynamic quantization exporting; docs; benchmarks, etc.

4cbb727

Require 4.41.0 for eval_strategy, etc.

bc4caa6

Restrict optimum-intel rather than optimum

d213510

Use subfolder rather than relying only on file_name

25dd01d

Relies on the upcoming optimum and optimum-intel versions; this is expected to fail until then.

Add link to OVBaseModel.from_pretrained

ea0ec5b

Add tips pointing to the new efficiency docs

23bbabd

Another pointer to the new efficiency docs

124ca2c

Expand the benchmark details

0a037e9

Update min. requirements to optimum 1.23.0 & optimum-intel 1.20.0

57d3049

tomaarsen changed the title ~~Add OpenVINO support~~ Add backends: ONNX & OpenVINO + ONNX optimization, quantization Oct 10, 2024

tomaarsen merged commit adbf0ba into UKPLab:master Oct 10, 2024
11 checks passed

This was referenced Oct 17, 2024

Added ORT/onnx support for loading models #2295

Closed

Fix cache_dir issue with loading CLIPModel #3007

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add backends: ONNX & OpenVINO + ONNX optimization, quantization #2712

Add backends: ONNX & OpenVINO + ONNX optimization, quantization #2712

helena-intel commented Jun 4, 2024

michaelfeil commented Jun 9, 2024 •

edited

Loading

helena-intel commented Jun 11, 2024

tomaarsen commented Sep 30, 2024 •

edited

Loading

helena-intel commented Oct 14, 2024

tomaarsen commented Oct 15, 2024

Add backends: ONNX & OpenVINO + ONNX optimization, quantization #2712

Add backends: ONNX & OpenVINO + ONNX optimization, quantization #2712

Conversation

helena-intel commented Jun 4, 2024

michaelfeil commented Jun 9, 2024 • edited Loading

helena-intel commented Jun 11, 2024

tomaarsen commented Sep 30, 2024 • edited Loading

helena-intel commented Oct 14, 2024

tomaarsen commented Oct 15, 2024

michaelfeil commented Jun 9, 2024 •

edited

Loading

tomaarsen commented Sep 30, 2024 •

edited

Loading