update vllm to 0.3 #110

themrzmaster · 2024-02-02T01:43:02Z

0.3 introduce custom fused MOE layers.
vllm-project/vllm#2542
improve performance

jeffreymeetkai · 2024-02-03T02:45:10Z

Hi, thank you for the notifying us about this improvement!

I've just tested our server inference with vllm migrated to v0.3.0 and it seems like there are some major changes to various parts of vllm such as the introduction of TokenizerGroup, etc. which are breaking our server inference both with and without grammar sampling. More work on the existing server inference code will need to be done before we can safely migrate over to vllm==0.3.0.

I will open a new PR once the modifications to existing code is ready. Thank you once again for this suggestion!

update vllm

49be552

jeffreymeetkai closed this Feb 3, 2024

jeffreymeetkai mentioned this pull request Feb 5, 2024

Upgrade vllm to 0.3.0 #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update vllm to 0.3 #110

update vllm to 0.3 #110

themrzmaster commented Feb 2, 2024

jeffreymeetkai commented Feb 3, 2024

update vllm to 0.3 #110

update vllm to 0.3 #110

Conversation

themrzmaster commented Feb 2, 2024

jeffreymeetkai commented Feb 3, 2024