Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable cuda version check in vllm-openai image #4530

Merged
merged 4 commits into from
May 5, 2024

Conversation

zhaoyang-star
Copy link
Contributor

@zhaoyang-star zhaoyang-star commented May 1, 2024

Fix #4521

Currently we no need to check cuda version when using fp8 kv cache. As of now, vLLM's binaries are compiled with CUDA 12.1 and public PyTorch release versions by default. The vllm-openai image has also CUDA 12.1.

@simon-mo
Copy link
Collaborator

simon-mo commented May 1, 2024

sorry i just merged the other PR, can you resolve the conflict?

@simon-mo
Copy link
Collaborator

simon-mo commented May 2, 2024

🤦‍♂️ sorry another conflict

@zhaoyang-star
Copy link
Contributor Author

@simon-mo The conflict is solved. Please take a review.

@simon-mo simon-mo merged commit 0650e59 into vllm-project:main May 5, 2024
57 of 59 checks passed
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 7, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--kv_cache_dtype fp8 should not check for nvcc
2 participants