-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Feature]: Add Support for Specifying Local CUTLASS Source Directory via Environment Variable
feature request
#10423
opened Nov 18, 2024 by
wchen61
1 task done
[Doc]: Compare LMDeploy vs vLLM AWQ Triton kernels
documentation
Improvements or additions to documentation
#10420
opened Nov 18, 2024 by
casper-hansen
1 task done
[Bug]: NCCL error with 2-way pipeline parallelism.
bug
Something isn't working
#10419
opened Nov 18, 2024 by
Pl4tiNuM
1 task done
[Bug]: KV Cache Quantization with GGUF turns out quite poorly.
bug
Something isn't working
#10411
opened Nov 18, 2024 by
phazei
1 task done
[Bug]: (Program crashes after increasing --tensor-parallel-size) with error pynvml.NVMLError_InvalidArgument: Invalid Argument
bug
Something isn't working
#10409
opened Nov 18, 2024 by
JohnConnor123
1 task done
[Bug]: 使用vllm和transformer部署Qwen2vl,同一张图片输出结果不一致
bug
Something isn't working
#10408
opened Nov 18, 2024 by
Apricot1225
1 task done
[New Model]: fishaudio/fish-speech-1.4
new model
Requests to new models
#10404
opened Nov 17, 2024 by
cavities
1 task done
[Bug]: Hermes tool parser output error stream arguments in some cases.
bug
Something isn't working
#10395
opened Nov 16, 2024 by
xiyuan-lee
1 task done
[Bug]: v0.6.4.post1 crashed:Error in model execution: CUDA error: an illegal memory access was encountered
bug
Something isn't working
#10389
opened Nov 16, 2024 by
wciq1208
1 task done
[Misc]: Ask for the roadmap of async output processing support for speculative decoding
misc
#10387
opened Nov 16, 2024 by
Lin-Qingyang-Alec
1 task done
[Bug]: Granite 3.0 disconnect between parser and example template
bug
Something isn't working
#10379
opened Nov 15, 2024 by
wilbry
1 task done
[Feature]: NVIDIA Triton GenAI Perf Benchmark
feature request
good first issue
Good for newcomers
help wanted
Extra attention is needed
#10377
opened Nov 15, 2024 by
simon-mo
1 task done
[Bug]: Guided Decoding Broken in Streaming mode
bug
Something isn't working
#10376
opened Nov 15, 2024 by
JC1DA
1 task done
[Bug]: Torch profiling does not stop and cannot get traces for all workers
bug
Something isn't working
#10365
opened Nov 15, 2024 by
ruisearch42
1 task done
[Bug]: Qwen2 VL takes only 18Gb when run by using hugggingface code, but the same model takes 38 GB GPU memory with VLM
bug
Something isn't working
#10357
opened Nov 15, 2024 by
Samjith888
[Usage]: cuda oom when serving multi task on same server
usage
How to use vllm
#10345
opened Nov 15, 2024 by
reneix
1 task done
[Misc]: Snowflake Arctic out of memory error with TP-8
bug
Something isn't working
#10344
opened Nov 14, 2024 by
rajagond
1 task done
[Feature]: Allow head_size smaller than 128 on TPU with Pallas backend
feature request
#10343
opened Nov 14, 2024 by
manninglucas
1 task done
[Bug]: KV Cache Error with KV_cache_dtype=FP8 and Large Sequence Length: Losing Context Length of Model
bug
Something isn't working
#10337
opened Nov 14, 2024 by
amakaido28
1 task done
[Bug]: different output of same prompt when inferred with single sequence vs concurrent requests on vllm openai server , temp =0.
bug
Something isn't working
#10336
opened Nov 14, 2024 by
bhupendrathore
1 task done
[Bug]: Out of Memory (OOM) Issues During MMLU Evaluation with lm_eval
bug
Something isn't working
#10325
opened Nov 14, 2024 by
wchen61
1 task done
[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}]
bug
Something isn't working
#10324
opened Nov 14, 2024 by
victorserbu2709
1 task done
[Feature]: To adapt to the TTS task, I need to directly pass in the embedding. How should I modify it?
feature request
#10323
opened Nov 14, 2024 by
1nlplearner
1 task done
[Installation]: Request to include vllm==0.6.2 for cuda 11.8
installation
Installation problems
#10319
opened Nov 14, 2024 by
amew0
1 task done
[Performance]: Results from the vLLM Blog article "How Speculative Decoding Boosts vLLM Performance by up to 2.8x" are unreproducible
performance
Performance-related issues
#10318
opened Nov 14, 2024 by
yeonjoon-jung01
1 task done
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.