diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/Multimodal/README.md b/python/llm/example/NPU/HF-Transformers-AutoModels/Multimodal/README.md index 88e71bb97a5..faacc0ae8d8 100644 --- a/python/llm/example/NPU/HF-Transformers-AutoModels/Multimodal/README.md +++ b/python/llm/example/NPU/HF-Transformers-AutoModels/Multimodal/README.md @@ -6,6 +6,8 @@ In this directory, you will find examples on how you could apply IPEX-LLM INT4 o | Model | Model Link | |------------|----------------------------------------------------------------| | Phi-3-Vision | [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) | +| MiniCPM-Llama3-V-2_5 | [openbmb/MiniCPM-Llama3-V-2_5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5) | +| MiniCPM-V-2_6 | [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) | ## 0. Requirements To run these examples with IPEX-LLM on Intel NPUs, make sure to install the newest driver version of Intel NPU. @@ -22,14 +24,12 @@ We suggest using conda to manage environment: conda create -n llm python=3.10 libuv conda activate llm -# install ipex-llm with 'all' option -pip install --pre --upgrade ipex-llm[all] +# install ipex-llm with 'npu' option +pip install --pre --upgrade ipex-llm[npu] pip install torchvision -# below command will install intel_npu_acceleration_library -pip install intel-npu-acceleration-library==1.3 - -pip install transformers==4.40 +# [optional] for MiniCPM-V-2_6 +pip install timm torch==2.1.2 torchvision==0.16.2 ``` ### 2. Runtime Configurations @@ -64,7 +64,7 @@ Arguments info: - `--load_in_low_bit`: argument defining the `load_in_low_bit` format used. It is default to be `sym_int8`, `sym_int4` can also be used. #### Sample Output -#### [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) +##### [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) ```log Inference time: xxxx s @@ -82,3 +82,38 @@ The sample input image is (which is fetched from [COCO dataset](https://cocodata +## 4. Run Optimized Models (Experimental) +The examples below show how to run the **_optimized HuggingFace model implementations_** on Intel NPU, including +- [MiniCPM-Llama3-V-2_5](./minicpm-llama3-v2.5.py) +- [MiniCPM-V-2_6](./minicpm_v_2_6.py) + +### Run +```bash +# to run MiniCPM-Llama3-V-2_5 +python minicpm-llama3-v2.5.py + +# to run MiniCPM-V-2_6 +python minicpm_v_2_6.py +``` + +Arguments info: +- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the model (i.e. `openbmb/MiniCPM-Llama3-V-2_5`) to be downloaded, or the path to the huggingface checkpoint folder. +- `image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be 'http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg'. +- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `What is in the image?`. +- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`. +- `--max-output-len MAX_OUTPUT_LEN`: Defines the maximum sequence length for both input and output tokens. It is default to be `1024`. +- `--max-prompt-len MAX_PROMPT_LEN`: Defines the maximum number of tokens that the input prompt can contain. It is default to be `512`. +- `--disable-transpose-value-cache`: Disable the optimization of transposing value cache. + +#### Sample Output +##### [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) + +```log +Inference time: xx.xx s +-------------------- Input -------------------- +http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg +-------------------- Prompt -------------------- +What is in this image? +-------------------- Output -------------------- +The image features a young child holding and showing off a white teddy bear wearing a pink dress. The background includes some red flowers and a stone wall, suggesting an outdoor setting. +``` \ No newline at end of file diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/Multimodal/minicpm_v_2_6.py b/python/llm/example/NPU/HF-Transformers-AutoModels/Multimodal/minicpm_v_2_6.py index c5f6bdd463a..259b8c122b1 100644 --- a/python/llm/example/NPU/HF-Transformers-AutoModels/Multimodal/minicpm_v_2_6.py +++ b/python/llm/example/NPU/HF-Transformers-AutoModels/Multimodal/minicpm_v_2_6.py @@ -37,7 +37,7 @@ help='Prompt to infer') parser.add_argument("--n-predict", type=int, default=32, help="Max tokens to predict") parser.add_argument("--max-output-len", type=int, default=1024) - parser.add_argument("--max-prompt-len", type=int, default=960) + parser.add_argument("--max-prompt-len", type=int, default=512) parser.add_argument("--disable-transpose-value-cache", action="store_true", default=False) parser.add_argument("--intra-pp", type=int, default=None) parser.add_argument("--inter-pp", type=int, default=None)