vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 4.6k
Star 30.4k

Code
Issues 1.8k
Pull requests 408
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q4 2024

#9006 opened Oct 1, 2024 by simon-mo

Open 19

vLLM's V1 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 9

Labels 56 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,849 Open 3,419 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Feature]: Add Support for Specifying Local CUTLASS Source Directory via Environment Variable feature request

#10423 opened Nov 18, 2024 by wchen61

1 task done

[Doc]: Compare LMDeploy vs vLLM AWQ Triton kernels documentation

Improvements or additions to documentation

#10420 opened Nov 18, 2024 by casper-hansen

1 task done

[Bug]: NCCL error with 2-way pipeline parallelism. bug

Something isn't working

#10419 opened Nov 18, 2024 by Pl4tiNuM

1 task done

[Bug]: KV Cache Quantization with GGUF turns out quite poorly. bug

Something isn't working

#10411 opened Nov 18, 2024 by phazei

1 task done

[Bug]: (Program crashes after increasing --tensor-parallel-size) with error pynvml.NVMLError_InvalidArgument: Invalid Argument bug

Something isn't working

#10409 opened Nov 18, 2024 by JohnConnor123

1 task done

[Bug]: 使用vllm和transformer部署Qwen2vl，同一张图片输出结果不一致 bug

Something isn't working

#10408 opened Nov 18, 2024 by Apricot1225

1 task done

[New Model]: fishaudio/fish-speech-1.4 new model

Requests to new models

#10404 opened Nov 17, 2024 by cavities

1 task done

[Bug]: Hermes tool parser output error stream arguments in some cases. bug

Something isn't working

#10395 opened Nov 16, 2024 by xiyuan-lee

1 task done

[Bug]: v0.6.4.post1 crashed：Error in model execution: CUDA error: an illegal memory access was encountered bug

Something isn't working

#10389 opened Nov 16, 2024 by wciq1208

1 task done

[Misc]: Ask for the roadmap of async output processing support for speculative decoding misc

#10387 opened Nov 16, 2024 by Lin-Qingyang-Alec

1 task done

[Bug]: Granite 3.0 disconnect between parser and example template bug

Something isn't working

#10379 opened Nov 15, 2024 by wilbry

1 task done

[Feature]: NVIDIA Triton GenAI Perf Benchmark feature request good first issue

Good for newcomers

help wanted

Extra attention is needed

#10377 opened Nov 15, 2024 by simon-mo

1 task done

[Bug]: Guided Decoding Broken in Streaming mode bug

Something isn't working

#10376 opened Nov 15, 2024 by JC1DA

1 task done

[Bug]: Torch profiling does not stop and cannot get traces for all workers bug

Something isn't working

#10365 opened Nov 15, 2024 by ruisearch42

1 task done

[Bug]: Qwen2 VL takes only 18Gb when run by using hugggingface code, but the same model takes 38 GB GPU memory with VLM bug

Something isn't working

#10357 opened Nov 15, 2024 by Samjith888

[Usage]: cuda oom when serving multi task on same server usage

How to use vllm

#10345 opened Nov 15, 2024 by reneix

1 task done

[Misc]: Snowflake Arctic out of memory error with TP-8 bug

Something isn't working

#10344 opened Nov 14, 2024 by rajagond

1 task done

[Feature]: Allow head_size smaller than 128 on TPU with Pallas backend feature request

#10343 opened Nov 14, 2024 by manninglucas

1 task done

[Bug]: KV Cache Error with KV_cache_dtype=FP8 and Large Sequence Length: Losing Context Length of Model bug

Something isn't working

#10337 opened Nov 14, 2024 by amakaido28

1 task done

[Bug]: different output of same prompt when inferred with single sequence vs concurrent requests on vllm openai server , temp =0. bug

Something isn't working

#10336 opened Nov 14, 2024 by bhupendrathore

1 task done

[Bug]: Out of Memory (OOM) Issues During MMLU Evaluation with lm_eval bug

Something isn't working

#10325 opened Nov 14, 2024 by wchen61

1 task done

[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}] bug

Something isn't working

#10324 opened Nov 14, 2024 by victorserbu2709

1 task done

[Feature]: To adapt to the TTS task, I need to directly pass in the embedding. How should I modify it? feature request

#10323 opened Nov 14, 2024 by 1nlplearner

1 task done

[Installation]: Request to include vllm==0.6.2 for cuda 11.8 installation

Installation problems

#10319 opened Nov 14, 2024 by amew0

1 task done

[Performance]: Results from the vLLM Blog article "How Speculative Decoding Boosts vLLM Performance by up to 2.8x" are unreproducible performance

Performance-related issues

#10318 opened Nov 14, 2024 by yeonjoon-jung01

1 task done

Previous 1 2 3 4 5 … 73 74 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly