-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add PaliGemma #5189
[Model] Add PaliGemma #5189
Conversation
Not ready for review yet but FYI @zhuohan123 @simon-mo since you asked me about this. also cc @DarkLight1337 since I might need you to review this eventually |
Seeing some correctness issue with the implementation of |
tests/models/test_paligemma.py
Outdated
input_id for input_id in input_ids | ||
if input_id != image_token_id and input_id != tokenizer.bos_token_id | ||
] | ||
if hf_input_ids[-1] == 108: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe indicate what this 108
means.
Design-wise I think the code is fine. However, there appear to be major discrepancies between the expected and actual output beyond what can be observed for the base Gemma model. For example, the |
@WoosukKwon I haven't, but I also need to update this PR because of the refactoring we just finished to make sure it works. |
This PR passed all tests locally and is ready for review. Please take a look @WoosukKwon! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM but I'll check the warnings outputted in the VLM tests to see how consistent it is with the HF model.
Seems that the vLLM output string is not following HF format: https://buildkite.com/vllm/ci-aws/builds/4150#0190861d-cafe-41db-81ba-cb4c0d8fc2a6 Can you fix that? |
Ah it's the eos_token. It's also weird that some of these tests actually don't give an output at all - I guess I will need to take a deeper look at it and see what went wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ywang96 Thanks for the PR! The PR looks good overall, but I think we can reduce the redundancy by reusing the code for Llava.
examples/paligemma_example.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this example? I'm wondering because it seems pretty similar to the llava example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests and examples were made for each model previously because the image-related engine args were different from each model. This is indeed no longer needed after the refactoring we did, and I will open another PR for consolidate them if you're okay with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! Sounds good. Please open the PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is also a bit redundant with test_llava.py
. Can we refactor test_llava.py
to cover both models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above.
Co-authored-by: Woosuk Kwon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ywang96 LGTM. Please feel free to open the PR for refactoring.
Also, please merge the PR after @DarkLight1337 also approves as he might be more knowledgeable about the models than me.
Although the vLLM output for Spanish translation has some numerical difference and diverges from HF output after the first 2 tokens, I think the overall meaning is similar enough so it is fine. Let's merge this. |
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Alvant <[email protected]>
Add support for PaliGemma - Google's Cutting-Edge Open Vision Language Model.
FIX #4833