Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update vllm to 0.3 #110

Closed
wants to merge 1 commit into from
Closed

Conversation

themrzmaster
Copy link

0.3 introduce custom fused MOE layers.
vllm-project/vllm#2542
improve performance

@jeffreymeetkai
Copy link
Collaborator

Hi, thank you for the notifying us about this improvement!

I've just tested our server inference with vllm migrated to v0.3.0 and it seems like there are some major changes to various parts of vllm such as the introduction of TokenizerGroup, etc. which are breaking our server inference both with and without grammar sampling. More work on the existing server inference code will need to be done before we can safely migrate over to vllm==0.3.0.

I will open a new PR once the modifications to existing code is ready. Thank you once again for this suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants