Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 #7273

Merged
merged 5 commits into from
Aug 8, 2024

Conversation

jeejeelee
Copy link
Collaborator

@jeejeelee jeejeelee commented Aug 7, 2024

I have completed the following modification:

ping @ywang96 @DarkLight1337 @HwwwwwwwH

Copy link

github-actions bot commented Aug 7, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@DarkLight1337
Copy link
Member

Could you also update the docs to include this in the list of supported models?

@jeejeelee jeejeelee requested a review from DarkLight1337 August 8, 2024 01:39
@HwwwwwwwH
Copy link
Contributor

HwwwwwwwH commented Aug 8, 2024

Sry for late. Thank you for your work! There're something still need attention. The example of minicpmv should be extended to contain three versions in example/offline_inference_vision_language.py. And three versions of minicpmv need different stop_token_ids. So if the user run example/offline_inference_vision_language.py, for example, for V2.5, they might got many <|eot|> at the end of outputs.
So the logic of offline_inference_vision_language.py may need a slight modification.

@jeejeelee
Copy link
Collaborator Author

Sry for late. Thank you for your work! There're something still need attention. The example of minicpmv should be extended to contain three versions in example/offline_inference_vision_language.py. And three versions of minicpmv need different stop_token_ids. So if the user run example/offline_inference_vision_language.py, for example, for V2.5, they might got many <|eot|> at the end of outputs. So the logic of offline_inference_vision_language.py many need a slight modification.

Ok, is that right?

  # 2.0
  stop_token_ids = [tokenizer.eos_id]
  # 2.5
  # stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
  #2.6
  # stop_tokens = ['<|im_end|>', '<|endoftext|>']
  # stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]

@HwwwwwwwH
Copy link
Contributor

Ok, is that right?

  # 2.0
  stop_token_ids = [tokenizer.eos_id]
  # 2.5
  # stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
  #2.6
  # stop_tokens = ['<|im_end|>', '<|endoftext|>']
  # stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]

Yes

@jeejeelee
Copy link
Collaborator Author

Ok, is that right?

  # 2.0
  stop_token_ids = [tokenizer.eos_id]
  # 2.5
  # stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
  #2.6
  # stop_tokens = ['<|im_end|>', '<|endoftext|>']
  # stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]

Yes

@DarkLight1337 I am not sure if I can add these in vl example

@DarkLight1337
Copy link
Member

Feel free to update the example!

@jeejeelee jeejeelee requested a review from DarkLight1337 August 8, 2024 06:19
@DarkLight1337
Copy link
Member

@HwwwwwwwH can you try running the example on your end? I'm outside right now.

@HwwwwwwwH
Copy link
Contributor

HwwwwwwwH commented Aug 8, 2024

@HwwwwwwwH can you try running the example on your end? I'm outside right now.

I'm running it now.

@HwwwwwwwH
Copy link
Contributor

@HwwwwwwwH I'd like to know why you chose to use the LlamaModel rather than LlamaForCausalLM as the language model for minicpm2.5. This approach actually changes the model hierarchy, which is not conducive to the support of LoRA.

emmmm, at first I used xxxCausalLM, but then I found that all of the CausalLM deleted the inputs_embeds parameters while kept it in xxModel . And using xxModel is what other VLMs do, so I followed this way.

@HwwwwwwwH
Copy link
Contributor

I got this error when running gguf. And I don't see any dependency in requirements files. Maybe you need add to this?

Traceback (most recent call last):
  File "/data1/a1/vllm_e/examples/offline_inference_vision_language.py", line 10, in <module>
    from vllm import LLM, SamplingParams
  File "/data1/a1/vllm_e/vllm/__init__.py", line 3, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
  File "/data1/a1/vllm_e/vllm/engine/arg_utils.py", line 7, in <module>
    from vllm.config import (CacheConfig, DecodingConfig, DeviceConfig,
  File "/data1/a1/vllm_e/vllm/config.py", line 11, in <module>
    from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
  File "/data1/a1/vllm_e/vllm/model_executor/layers/quantization/__init__.py", line 16, in <module>
    from vllm.model_executor.layers.quantization.gguf import GGUFConfig
  File "/data1/a1/vllm_e/vllm/model_executor/layers/quantization/gguf.py", line 3, in <module>
    import gguf
ModuleNotFoundError: No module named 'gguf'

@DarkLight1337
Copy link
Member

I got this error when running gguf. And I don't see any dependency in requirements files. Maybe you need add to this?

Traceback (most recent call last):
  File "/data1/a1/vllm_e/examples/offline_inference_vision_language.py", line 10, in <module>
    from vllm import LLM, SamplingParams
  File "/data1/a1/vllm_e/vllm/__init__.py", line 3, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
  File "/data1/a1/vllm_e/vllm/engine/arg_utils.py", line 7, in <module>
    from vllm.config import (CacheConfig, DecodingConfig, DeviceConfig,
  File "/data1/a1/vllm_e/vllm/config.py", line 11, in <module>
    from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
  File "/data1/a1/vllm_e/vllm/model_executor/layers/quantization/__init__.py", line 16, in <module>
    from vllm.model_executor.layers.quantization.gguf import GGUFConfig
  File "/data1/a1/vllm_e/vllm/model_executor/layers/quantization/gguf.py", line 3, in <module>
    import gguf
ModuleNotFoundError: No module named 'gguf'

It should be in the latest requirements.txt. Maybe need to sync this branch with main.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Aug 8, 2024

@HwwwwwwwH I'd like to know why you chose to use the LlamaModel rather than LlamaForCausalLM as the language model for minicpm2.5. This approach actually changes the model hierarchy, which is not conducive to the support of LoRA.

emmmm, at first I used xxxCausalLM, but then I found that all of the CausalLM deleted the inputs_embeds parameters while kept it in xxModel . And using xxModel is what other VLMs do, so I followed this way.

Yeah, the existing VLMs use the *Model class rather than *ForCausalLM. There have been efforts to change this though to use @Isotr0py 's recipe for loading inner models. (See #7153)

@HwwwwwwwH
Copy link
Contributor

@HwwwwwwwH can you try running the example on your end? I'm outside right now.

It's ok with the examples.

@jeejeelee
Copy link
Collaborator Author

@HwwwwwwwH I'd like to know why you chose to use the LlamaModel rather than LlamaForCausalLM as the language model for minicpm2.5. This approach actually changes the model hierarchy, which is not conducive to the support of LoRA.

emmmm, at first I used xxxCausalLM, but then I found that all of the CausalLM deleted the inputs_embeds parameters while kept it in xxModel . And using xxModel is what other VLMs do, so I followed this way.

Yeah, the existing VLMs use the *Model class rather than *ForCausalLM. There have been efforts to change this though to use @Isotr0py 's recipe for loading inner models. (See #7153)

Sorry for deleting my previous comments by mistake. I will read these codes, thank you.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 8, 2024 12:48
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 8, 2024
@DarkLight1337 DarkLight1337 merged commit 757ac70 into vllm-project:main Aug 8, 2024
60 checks passed
sfc-gh-mkeralapura pushed a commit to sfc-gh-mkeralapura/vllm that referenced this pull request Aug 12, 2024
kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024
@jeejeelee jeejeelee deleted the optimize-minicpmv-code branch August 19, 2024 08:09
fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]:Is MiniCPM-V-2_6 supported?
3 participants