Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Update installation doc for build from source and explain the dependency on torch/cuda version #3746

Merged
merged 4 commits into from
Mar 30, 2024
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 21 additions & 12 deletions docs/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ You can install vLLM using pip:

.. code-block:: console

$ # (Optional) Create a new conda environment.
$ # (Recommended) Create a new conda environment.
$ conda create -n myenv python=3.9 -y
$ conda activate myenv

Expand All @@ -28,24 +28,19 @@ You can install vLLM using pip:

.. note::

As of now, vLLM's binaries are compiled on CUDA 12.1 by default.
However, you can install vLLM with CUDA 11.8 by running:
As of now, vLLM's binaries are compiled with CUDA 12.1 and public PyTorch release versions by default.
We also provide vLLM binaries compiled with CUDA 11.8 and public PyTorch release versions:

.. code-block:: console

$ # Install vLLM with CUDA 11.8.
$ export VLLM_VERSION=0.2.4
$ export VLLM_VERSION=0.4.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way that we can keep this version number updated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emmm that might need something like rst file templates. I don't think we need to go that far. Even if the number is outdated, users should know they need to change it according to their need.

$ export PYTHON_VERSION=39
$ pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl
$ pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118

$ # Re-install PyTorch with CUDA 11.8.
$ pip uninstall torch -y
$ pip install torch --upgrade --index-url https://download.pytorch.org/whl/cu118

$ # Re-install xFormers with CUDA 11.8.
$ pip uninstall xformers -y
$ pip install --upgrade xformers --index-url https://download.pytorch.org/whl/cu118
In order to be performant, vLLM has to compile many cuda kernels. The compilation unfortunately introduces binary incompatibility with other CUDA versions and PyTorch versions, even for the same PyTorch version with different building configurations.

Therefore, it is recommended to install vLLM with a **fresh new** conda environment. If either you have a different CUDA version or you want to use an existing PyTorch installation, you need to build vLLM from source. See below for instructions.

.. _build_from_source:

Expand Down Expand Up @@ -77,6 +72,20 @@ You can also build and install vLLM from source:
$ # Use `--ipc=host` to make sure the shared memory is large enough.
$ docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3

If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from `the official website <https://developer.nvidia.com/cuda-12-1-0-download-archive>`_. After installation, set the environment variable `CUDA_HOME` to the installation path of CUDA Toolkit, and make sure that the `nvcc` compiler is in your `PATH`, e.g.:
youkaichao marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: console

$ export CUDA_HOME=/usr/local/cuda
$ export PATH="${CUDA_HOME}/bin:$PATH"

Here is a sanity check to verify that the CUDA Toolkit is correctly installed:

.. code-block:: console

$ nvcc --version # verify that nvcc is in your PATH
$ ${CUDA_HOME}/bin/nvcc --version # verify that nvcc is in your CUDA_HOME

.. note::
If you are developing the C++ backend of vLLM, consider building vLLM with

Expand Down
Loading