From c3c6d384f5874b3d339a067bd56aa94dfe0f9b64 Mon Sep 17 00:00:00 2001
From: youkaichao <youkaichao@126.com>
Date: Mon, 25 Mar 2024 17:42:52 -0700
Subject: [PATCH 1/4] stash

---
 docs/source/getting_started/installation.rst | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/docs/source/getting_started/installation.rst b/docs/source/getting_started/installation.rst
index 3355a894852f7..0910e2b9f9fa4 100644
--- a/docs/source/getting_started/installation.rst
+++ b/docs/source/getting_started/installation.rst
@@ -19,7 +19,7 @@ You can install vLLM using pip:
 
 .. code-block:: console
 
-    $ # (Optional) Create a new conda environment.
+    $ # (Recommended) Create a new conda environment.
     $ conda create -n myenv python=3.9 -y
     $ conda activate myenv
 
@@ -28,24 +28,20 @@ You can install vLLM using pip:
 
 .. note::
 
-    As of now, vLLM's binaries are compiled on CUDA 12.1 by default.
-    However, you can install vLLM with CUDA 11.8 by running:
+    As of now, vLLM's binaries are compiled with CUDA 12.1 and public PyTorch release versions by default.
+    We also provide vLLM binaries compiled with CUDA 11.8 and public PyTorch release versions:
 
     .. code-block:: console
 
         $ # Install vLLM with CUDA 11.8.
-        $ export VLLM_VERSION=0.2.4
+        $ export VLLM_VERSION=0.3.3
         $ export PYTHON_VERSION=39
-        $ pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl
+        $ pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 
-        $ # Re-install PyTorch with CUDA 11.8.
-        $ pip uninstall torch -y
-        $ pip install torch --upgrade --index-url https://download.pytorch.org/whl/cu118
-
-        $ # Re-install xFormers with CUDA 11.8.
-        $ pip uninstall xformers -y
-        $ pip install --upgrade xformers --index-url https://download.pytorch.org/whl/cu118
+    The compilation unfortunately introduces binary incompatibility with other CUDA versions and PyTorch versions, even for the same PyTorch version with different building configurations.
 
+    Therefore, it is recommended to use install vLLM with a fresh new conda environment,.
+    If either you have a different CUDA version or you want to use an existing PyTorch installation, you need to build vLLM from source. See below for instructions.
 
 .. _build_from_source:
 

From 486be90c2138af7fbb6d220db7dc8f95f24d4682 Mon Sep 17 00:00:00 2001
From: youkaichao <youkaichao@126.com>
Date: Sat, 30 Mar 2024 00:00:17 -0700
Subject: [PATCH 2/4] updaye installation doc

---
 docs/source/getting_started/installation.rst | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/docs/source/getting_started/installation.rst b/docs/source/getting_started/installation.rst
index 0910e2b9f9fa4..50d1ad3b59934 100644
--- a/docs/source/getting_started/installation.rst
+++ b/docs/source/getting_started/installation.rst
@@ -34,14 +34,13 @@ You can install vLLM using pip:
     .. code-block:: console
 
         $ # Install vLLM with CUDA 11.8.
-        $ export VLLM_VERSION=0.3.3
+        $ export VLLM_VERSION=0.4.0
         $ export PYTHON_VERSION=39
         $ pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 
-    The compilation unfortunately introduces binary incompatibility with other CUDA versions and PyTorch versions, even for the same PyTorch version with different building configurations.
+    In order to be performant, vLLM has to compile many cuda kernels. The compilation unfortunately introduces binary incompatibility with other CUDA versions and PyTorch versions, even for the same PyTorch version with different building configurations.
 
-    Therefore, it is recommended to use install vLLM with a fresh new conda environment,.
-    If either you have a different CUDA version or you want to use an existing PyTorch installation, you need to build vLLM from source. See below for instructions.
+    Therefore, it is recommended to use install vLLM with a fresh new conda environment. If either you have a different CUDA version or you want to use an existing PyTorch installation, you need to build vLLM from source. See below for instructions.
 
 .. _build_from_source:
 
@@ -73,6 +72,13 @@ You can also build and install vLLM from source:
         $ # Use `--ipc=host` to make sure the shared memory is large enough.
         $ docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3
 
+    If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from `the official website <https://developer.nvidia.com/cuda-12-1-0-download-archive>`_. After installation, set the environment variable `CUDA_HOME` to the installation path of CUDA Toolkit, and make sure that the `nvcc` compiler is in your `PATH`, e.g. `export CUDA_HOME=/usr/local/cuda ; export PATH="${CUDA_HOME}/bin:$PATH" ;` Here is a sanity check to verify that the CUDA Toolkit is correctly installed:
+
+    .. code-block:: console
+
+        $ nvcc --version # verify that nvcc is in your PATH
+        $ ${CUDA_HOME}/bin/nvcc --version # verify that nvcc is in your CUDA_HOME
+
 .. note::
     If you are developing the C++ backend of vLLM, consider building vLLM with
 

From e17502180df5f2ca8622f210a2f1bce9886fcbbd Mon Sep 17 00:00:00 2001
From: youkaichao <youkaichao@126.com>
Date: Sat, 30 Mar 2024 09:51:35 -0700
Subject: [PATCH 3/4] update installation doc

---
 docs/source/getting_started/installation.rst | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/source/getting_started/installation.rst b/docs/source/getting_started/installation.rst
index 50d1ad3b59934..77002611aaf3f 100644
--- a/docs/source/getting_started/installation.rst
+++ b/docs/source/getting_started/installation.rst
@@ -40,7 +40,7 @@ You can install vLLM using pip:
 
     In order to be performant, vLLM has to compile many cuda kernels. The compilation unfortunately introduces binary incompatibility with other CUDA versions and PyTorch versions, even for the same PyTorch version with different building configurations.
 
-    Therefore, it is recommended to use install vLLM with a fresh new conda environment. If either you have a different CUDA version or you want to use an existing PyTorch installation, you need to build vLLM from source. See below for instructions.
+    Therefore, it is recommended to install vLLM with a **fresh new** conda environment. If either you have a different CUDA version or you want to use an existing PyTorch installation, you need to build vLLM from source. See below for instructions.
 
 .. _build_from_source:
 
@@ -72,7 +72,14 @@ You can also build and install vLLM from source:
         $ # Use `--ipc=host` to make sure the shared memory is large enough.
         $ docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3
 
-    If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from `the official website <https://developer.nvidia.com/cuda-12-1-0-download-archive>`_. After installation, set the environment variable `CUDA_HOME` to the installation path of CUDA Toolkit, and make sure that the `nvcc` compiler is in your `PATH`, e.g. `export CUDA_HOME=/usr/local/cuda ; export PATH="${CUDA_HOME}/bin:$PATH" ;` Here is a sanity check to verify that the CUDA Toolkit is correctly installed:
+    If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from `the official website <https://developer.nvidia.com/cuda-12-1-0-download-archive>`_. After installation, set the environment variable `CUDA_HOME` to the installation path of CUDA Toolkit, and make sure that the `nvcc` compiler is in your `PATH`, e.g.:
+
+    .. code-block:: console
+
+        $ export CUDA_HOME=/usr/local/cuda
+        $ export PATH="${CUDA_HOME}/bin:$PATH"
+
+    Here is a sanity check to verify that the CUDA Toolkit is correctly installed:
 
     .. code-block:: console
 

From 307c77b13e18b63fefee049c9737cf4b09585825 Mon Sep 17 00:00:00 2001
From: youkaichao <youkaichao@gmail.com>
Date: Sat, 30 Mar 2024 15:56:15 -0700
Subject: [PATCH 4/4] Update docs/source/getting_started/installation.rst

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
---
 docs/source/getting_started/installation.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/getting_started/installation.rst b/docs/source/getting_started/installation.rst
index 77002611aaf3f..5dfb32080f97a 100644
--- a/docs/source/getting_started/installation.rst
+++ b/docs/source/getting_started/installation.rst
@@ -72,7 +72,7 @@ You can also build and install vLLM from source:
         $ # Use `--ipc=host` to make sure the shared memory is large enough.
         $ docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3
 
-    If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from `the official website <https://developer.nvidia.com/cuda-12-1-0-download-archive>`_. After installation, set the environment variable `CUDA_HOME` to the installation path of CUDA Toolkit, and make sure that the `nvcc` compiler is in your `PATH`, e.g.:
+    If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from `the official website <https://developer.nvidia.com/cuda-toolkit-archive>`_. After installation, set the environment variable `CUDA_HOME` to the installation path of CUDA Toolkit, and make sure that the `nvcc` compiler is in your `PATH`, e.g.:
 
     .. code-block:: console