diff --git a/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/README.md b/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/README.md index cc293ab9990..b79adf3ba74 100644 --- a/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/README.md +++ b/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/README.md @@ -1,5 +1,5 @@ # MiniCPM-V-2 -In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM-V-2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openbmb/MiniCPM-V-2](https://huggingface.co/openbmb/MiniCPM-V-2) as a reference MiniCPM-V-2 model. +In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM-V-2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openbmb/MiniCPM-V-2](https://huggingface.co/openbmb/MiniCPM-V-2) and [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) as reference MiniCPM-V-2 models. ## 0. Requirements To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information. @@ -27,7 +27,7 @@ conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ -pip install timm peft +pip install timm peft transformers==4.40.0 trl ``` ### 2. Configures OneAPI environment variables for Linux @@ -130,6 +130,18 @@ What is in the image? In the image, there is a young child holding a teddy bear. The teddy bear appears to be dressed in a pink tutu. The child is also wearing a red and white striped dress. The background of the image includes a stone wall and some red flowers. ``` +#### [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) + +```log +Inference time: 3.102498769760132 s +-------------------- Input -------------------- +http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg +-------------------- Prompt -------------------- +What is in the image? +-------------------- Output -------------------- +The image features a young child holding a white teddy bear with a pink tutu. The child is wearing a striped dress and is standing in front of a stone wall with some red flowers in the background. +``` + The sample input image is (which is fetched from [COCO dataset](https://cocodataset.org/#explore?id=264959)): diff --git a/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/generate.py b/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/generate.py index 07b3021e0c2..52cc5450e2e 100644 --- a/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/generate.py +++ b/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/generate.py @@ -157,7 +157,7 @@ def _pos_embed(self, x: torch.Tensor) -> torch.Tensor: # here the prompt tuning refers to https://huggingface.co/openbmb/MiniCPM-V-2/blob/main/README.md msgs = [{'role': 'user', 'content': args.prompt}] st = time.time() - res, context, _ = model.chat( + res = model.chat( image=image, msgs=msgs, context=None, @@ -165,6 +165,8 @@ def _pos_embed(self, x: torch.Tensor) -> torch.Tensor: sampling=False, temperature=0.7 ) + if model.config._name_or_path.endswith("2"): + res, context, _ = res end = time.time() print(f'Inference time: {end-st} s') print('-'*20, 'Input', '-'*20)