-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[NPU] Example & Quickstart updates (#12650)
* Remove model with optimize_model=False in NPU verified models tables, and remove related example * Remove experimental in run optimized model section title * Unify model table order & example cmd * Move embedding example to separate folder & update quickstart example link * Add Quickstart reference in main NPU readme * Small fix * Small fix * Move save/load examples under NPU/HF-Transformers-AutoModels * Add low-bit and polish arguments for LLM Python examples * Small fix * Add low-bit and polish arguments for Multi-Model examples * Polish argument for Embedding models * Polish argument for LLM CPP examples * Add low-bit and polish argument for Save-Load examples * Add accuracy tuning tips for examples * Update NPU qucikstart accuracy tuning with low-bit optimizations * Add save/load section to qucikstart * Update CPP example sample output to EN * Add installation regarding cmake for CPP examples * Small fix * Small fix * Small fix * Small fix * Small fix * Small fix * Unify max prompt length to 512 * Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4 * Update based on comments * Small fix
- Loading branch information
1 parent
ddc0ef3
commit 381d448
Showing
23 changed files
with
314 additions
and
495 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
59 changes: 59 additions & 0 deletions
59
python/llm/example/NPU/HF-Transformers-AutoModels/Embedding/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Run Embedding Model on Intel NPU | ||
In this directory, you will find examples on how you could apply IPEX-LLM low-bit optimizations on embedding models on [Intel NPUs](../../../README.md). See the table blow for verified models. | ||
|
||
## Verified Models | ||
|
||
| Model | Model Link | | ||
|------------|----------------------------------------------------------------| | ||
| Bce-Embedding-Base-V1 | [maidalun1020/bce-embedding-base_v1](https://huggingface.co/maidalun1020/bce-embedding-base_v1) | | ||
|
||
Please refer to [Quickstart](../../../../../../docs/mddocs/Quickstart/npu_quickstart.md#python-api) for details about verified platforms. | ||
|
||
## 0. Prerequisites | ||
For `ipex-llm` NPU support, please refer to [Quickstart](../../../../../../docs/mddocs/Quickstart/npu_quickstart.md#install-prerequisites) for details about the required preparations. | ||
|
||
## 1. Install | ||
### 1.1 Installation on Windows | ||
We suggest using conda to manage environment: | ||
```bash | ||
conda create -n llm python=3.11 | ||
conda activate llm | ||
|
||
# install ipex-llm with 'npu' option | ||
pip install --pre --upgrade ipex-llm[npu] | ||
|
||
# [optional] for Bce-Embedding-Base-V1 | ||
pip install BCEmbedding==0.1.5 transformers==4.40.0 | ||
``` | ||
Please refer to [Quickstart](../../../../../../docs/mddocs/Quickstart/npu_quickstart.md#install-ipex-llm-with-npu-support) for more details about `ipex-llm` installation on Intel NPU. | ||
|
||
### 1.2 Runtime Configurations | ||
Please refer to [Quickstart](../../../../../../docs/mddocs/Quickstart/npu_quickstart.md#runtime-configurations) for environment variables setting based on your device. | ||
|
||
## 2. Run Optimized Models | ||
The examples below show how to run the **_optimized HuggingFace model implementations_** on Intel NPU, including | ||
- [Bce-Embedding-Base-V1 ](./bce-embedding.py) | ||
|
||
### 2.1 Run Bce-Embedding-Base-V1 | ||
```bash | ||
# to run Bce-Embedding-Base-V1 | ||
python bce-embedding.py --save-directory <converted_model_path> | ||
``` | ||
|
||
Arguments info: | ||
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the model (i.e. `maidalun1020/bce-embedding-base_v1`) to be downloaded, or the path to the huggingface checkpoint folder. | ||
- `--prompt PROMPT`: argument defining the sentences to encode. | ||
- `--max-context-len MAX_CONTEXT_LEN`: argument defining the maximum sequence length for both input and output tokens. It is default to be `1024`. | ||
- `--max-prompt-len MAX_PROMPT_LEN`: argument defining the maximum number of tokens that the input prompt can contain. It is default to be `512`. | ||
- `--save-directory SAVE_DIRECTORY`: argument defining the path to save converted model. If it is a non-existing path, the original pretrained model specified by `REPO_ID_OR_MODEL_PATH` will be loaded, otherwise the lowbit model in `SAVE_DIRECTORY` will be loaded. | ||
|
||
#### Sample Output | ||
##### [maidalun1020/bce-embedding-base_v1](https://huggingface.co/maidalun1020/bce-embedding-base_v1) | ||
|
||
```log | ||
Inference time: xxxx s | ||
[[-0.00674987 -0.01700369 -0.0028928 ... -0.05296675 -0.00352772 | ||
0.00827096] | ||
[-0.04398304 0.00023038 0.00643183 ... -0.02717186 0.00483789 | ||
0.02298774]] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.