-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v0.3.1] Release Tracker #2859
Comments
@simon-mo Please feel free to add more! |
I would really like #2804 but it seems to be blocked by either FlashInfer or other libraries |
@simon-mo I think the main concern here is AMD because the ROCm xformers patch uses |
I see. We might be able to distribute different version with varying version pins... |
need supporting miqu-1-70b-sf-gptq. thanks a lot! |
I would like to see a fix to #2795. I and two other users have been unable to use the latest version of vLLM with Ray, however, it works perfectly well after downgrading to the previous version. |
#2761 brings back support for quantized MoE models like Mixtral/Deepseek. Also brings a great speedup (2-3x). Possible to include it in the next release so that quantized models are not broken? |
@umarbutler In this release, we will disable to the custom all reduce, which should address #2795. |
We will need to include #2875 in the release as well |
@pcmoritz Added. Thanks! |
@WoosukKwon Sorry for the delay. Will address the comments tonight. |
ETA: Feb 14-16 th
Major changes
TBD
PRs to be merged before the release
Support per-request seed #2514LLM
class is deleted. [BugFix] Fix GC bug forLLM
class #2882The text was updated successfully, but these errors were encountered: