added minicpm-v-2_6 (#11794)

intel-analytics · Aug 14, 2024 · d8d887e · d8d887e
1 parent 3d6cfa2
commit d8d887e
Show file tree

Hide file tree

Showing 2 changed files with 17 additions and 3 deletions.
diff --git a/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/README.md b/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/README.md
@@ -1,5 +1,5 @@
 # MiniCPM-V-2
-In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM-V-2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openbmb/MiniCPM-V-2](https://huggingface.co/openbmb/MiniCPM-V-2) as a reference MiniCPM-V-2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM-V-2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openbmb/MiniCPM-V-2](https://huggingface.co/openbmb/MiniCPM-V-2) and [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) as reference MiniCPM-V-2 models.
 
 ## 0. Requirements
 To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
@@ -27,7 +27,7 @@ conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
-pip install timm peft
+pip install timm peft transformers==4.40.0 trl
 ```
 
 ### 2. Configures OneAPI environment variables for Linux
@@ -130,6 +130,18 @@ What is in the image?
 In the image, there is a young child holding a teddy bear. The teddy bear appears to be dressed in a pink tutu. The child is also wearing a red and white striped dress. The background of the image includes a stone wall and some red flowers.
 ```
 
+#### [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
+
+```log
+Inference time: 3.102498769760132 s
+-------------------- Input --------------------
+http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg
+-------------------- Prompt --------------------
+What is in the image?
+-------------------- Output --------------------
+The image features a young child holding a white teddy bear with a pink tutu. The child is wearing a striped dress and is standing in front of a stone wall with some red flowers in the background.
+```
+
 The sample input image is (which is fetched from [COCO dataset](https://cocodataset.org/#explore?id=264959)):
 
 <a href="http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg"><img width=400px src="http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg" ></a>
diff --git a/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/generate.py b/python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2/generate.py
@@ -157,14 +157,16 @@ def _pos_embed(self, x: torch.Tensor) -> torch.Tensor:
     # here the prompt tuning refers to https://huggingface.co/openbmb/MiniCPM-V-2/blob/main/README.md
     msgs = [{'role': 'user', 'content': args.prompt}]
     st = time.time()
-    res, context, _ = model.chat(
+    res = model.chat(
      image=image,
      msgs=msgs,
      context=None,
      tokenizer=tokenizer,
      sampling=False,
      temperature=0.7
     )
+    if model.config._name_or_path.endswith("2"):
+        res, context, _ = res
     end = time.time()
     print(f'Inference time: {end-st} s')
     print('-'*20, 'Input', '-'*20)