diff --git a/README.md b/README.md
index deda1b1fe..22bc2e4cd 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@ LLM-on-Ray is a comprehensive solution designed to empower users in building, cu
 
 LLM-on-Ray harnesses the power of Ray, an industry-leading framework for distributed computing, to scale your AI workloads efficiently. This integration ensures robust fault tolerance and cluster resource management, making your LLM projects more resilient and scalable.
 
-LLM-on-Ray is built to operate across various hardware setups, including Intel CPU, Intel GPU and Intel Gaudi2. It incorporates several industry and Intel optimizations to maximize performance, including [vLLM](https://github.com/vllm-project/vllm), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch)/[Deepspeed](https://github.com/intel/intel-extension-for-deepspeed), [BigDL-LLM](https://github.com/intel-analytics/BigDL), [RecDP-LLM](https://github.com/intel/e2eAIOK/tree/main/RecDP/pyrecdp/LLM), [NeuralChat](https://huggingface.co/Intel/neural-chat-7b-v3-1) and more. 
+LLM-on-Ray is built to operate across various hardware setups, including Intel CPU, Intel GPU and Intel Gaudi2. It incorporates several industry and Intel optimizations to maximize performance, including [vLLM](https://github.com/vllm-project/vllm), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch)/[Deepspeed](https://github.com/intel/intel-extension-for-deepspeed), [BigDL-LLM](https://github.com/intel-analytics/BigDL), [RecDP-LLM](https://github.com/intel/e2eAIOK/tree/main/RecDP/pyrecdp/LLM), [NeuralChat](https://huggingface.co/Intel/neural-chat-7b-v3-1) and more.
 
 ## Solution Technical Overview
 LLM-on-Ray's modular workflow structure is designed to comprehensively cater to the various stages of LLM development, from pretraining and finetuning to serving. These workflows are intuitive, highly configurable, and tailored to meet the specific needs of each phase in the LLM lifecycle:
@@ -44,12 +44,15 @@ git clone https://github.com/intel/llm-on-ray.git
 cd llm-on-ray
 conda create -n llm-on-ray python=3.9
 conda activate llm-on-ray
-pip install .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html
-# Dynamic link oneCCL and Intel MPI libraries
-source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)")/env/setvars.sh
+pip install .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 ```
 
 #### 2. Start Ray
+__[Optional]__ If DeepSpeed is enabled or doing distributed finetuing, oneCCL and Intel MPI libraries should be dynamically linked in every node before Ray starts:
+```bash
+source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl; print(torch_ccl.cwd)")/env/setvars.sh
+```
+
 Start Ray locally using the following command. To launch a Ray cluster, please follow the [setup](docs/setup.md) document.
 ```bash
 ray start --head
@@ -117,7 +120,7 @@ The following are detailed guidelines for pretraining, finetuning and serving LL
 ### Web UI
 * [Finetune and Deploy LLMs through Web UI](docs/web_ui.md)
 
-## Disclaimer 
-To the extent that any public datasets are referenced by Intel or accessed using tools or code on this site those datasets are provided by the third party indicated as the data source. Intel does not create the data, or datasets, and does not warrant their accuracy or quality. By accessing the public dataset(s), or using a model trained on those datasets, you agree to the terms associated with those datasets and that your use complies with the applicable license. 
- 
+## Disclaimer
+To the extent that any public datasets are referenced by Intel or accessed using tools or code on this site those datasets are provided by the third party indicated as the data source. Intel does not create the data, or datasets, and does not warrant their accuracy or quality. By accessing the public dataset(s), or using a model trained on those datasets, you agree to the terms associated with those datasets and that your use complies with the applicable license.
+
 Intel expressly disclaims the accuracy, adequacy, or completeness of any public datasets, and is not liable for any errors, omissions, or defects in the data, or for any reliance on the data.  Intel is not liable for any liability or damages relating to your use of public datasets.
diff --git a/dev/docker/Dockerfile.bigdl-cpu b/dev/docker/Dockerfile.bigdl-cpu
index 8eb38d4bf..411449e41 100644
--- a/dev/docker/Dockerfile.bigdl-cpu
+++ b/dev/docker/Dockerfile.bigdl-cpu
@@ -29,8 +29,8 @@ COPY ./MANIFEST.in .
 
 RUN mkdir ./finetune && mkdir ./inference
 
-RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] -f https://developer.intel.com/ipex-whl-stable-cpu \
-    -f https://download.pytorch.org/whl/torch_stable.html
+RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] --extra-index-url https://download.pytorch.org/whl/cpu \
+    --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 
 # Used to invalidate docker build cache with --build-arg CACHEBUST=$(date +%s)
 ARG CACHEBUST=1
diff --git a/dev/docker/Dockerfile.cpu_and_deepspeed b/dev/docker/Dockerfile.cpu_and_deepspeed
index 7f4847ae0..5371fae78 100644
--- a/dev/docker/Dockerfile.cpu_and_deepspeed
+++ b/dev/docker/Dockerfile.cpu_and_deepspeed
@@ -29,8 +29,8 @@ COPY ./MANIFEST.in .
 
 RUN mkdir ./finetune && mkdir ./inference
 
-RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] -f https://developer.intel.com/ipex-whl-stable-cpu \
-    -f https://download.pytorch.org/whl/torch_stable.html
+RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \
+    --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 
 RUN ds_report
 
diff --git a/dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable b/dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable
index 217bf4a23..82eef4aa8 100644
--- a/dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable
+++ b/dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable
@@ -27,8 +27,8 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
 # copy all checkedout file for later non-editable pip
 COPY . .
 
-RUN --mount=type=cache,target=/root/.cache/pip pip install .[cpu,deepspeed] -f https://developer.intel.com/ipex-whl-stable-cpu \
-    -f https://download.pytorch.org/whl/torch_stable.html
+RUN --mount=type=cache,target=/root/.cache/pip pip install .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \
+    --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 
 RUN ds_report
 
diff --git a/dev/docker/Dockerfile.vllm b/dev/docker/Dockerfile.vllm
index e4eb63d06..23d4bbe48 100644
--- a/dev/docker/Dockerfile.vllm
+++ b/dev/docker/Dockerfile.vllm
@@ -30,8 +30,8 @@ COPY ./dev/scripts/install-vllm-cpu.sh .
 
 RUN mkdir ./finetune && mkdir ./inference
 
-RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu \
-    -f https://download.pytorch.org/whl/torch_stable.html
+RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu \
+    --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 
 # Install vllm-cpu
 # Activate base first for loading g++ envs ($CONDA_PREFIX/etc/conda/activate.d/*)
diff --git a/dev/k8s/Dockerfile b/dev/k8s/Dockerfile
index 7b99fc808..8dcccdb5e 100644
--- a/dev/k8s/Dockerfile
+++ b/dev/k8s/Dockerfile
@@ -23,7 +23,7 @@ RUN "$HOME/anaconda3/bin/pip install accelerate==0.19.0" \
     "$HOME/anaconda3/bin/pip install gymnasium"  \
     "$HOME/anaconda3/bin/pip install dm-tree"  \
     "$HOME/anaconda3/bin/pip install scikit-image"  \
-    "$HOME/anaconda3/bin/pip install oneccl_bind_pt==1.13 -f https://developer.intel.com/ipex-whl-stable-cpu"
+    "$HOME/anaconda3/bin/pip install oneccl_bind_pt==1.13 --extra-index-url https://developer.intel.com/ipex-whl-stable-cpu"
 
 # set http_proxy & https_proxy
 ENV http_proxy=${http_proxy}
diff --git a/dev/scripts/install-vllm-cpu.sh b/dev/scripts/install-vllm-cpu.sh
index 64b3690a4..7e96ba5ba 100755
--- a/dev/scripts/install-vllm-cpu.sh
+++ b/dev/scripts/install-vllm-cpu.sh
@@ -17,4 +17,4 @@ version_greater_equal "${gcc_version}" 12.3.0 || { echo "GNU C++ Compiler 12.3.0
 
 # Install from source
 MAX_JOBS=8 pip install -v git+https://github.com/bigPYJ1151/vllm@PR_Branch \
-    -f https://download.pytorch.org/whl/torch_stable.html
+    --extra-index-url https://download.pytorch.org/whl/cpu
diff --git a/docs/serve_bigdl.md b/docs/serve_bigdl.md
index 7bfb4001b..ae697bec9 100644
--- a/docs/serve_bigdl.md
+++ b/docs/serve_bigdl.md
@@ -6,7 +6,7 @@ The integration with BigDL-LLM currently only supports running on Intel CPU.
 ## Setup
 Please follow [setup.md](setup.md) to setup the environment first. Additional, you will need to install bigdl dependencies as below.
 ```bash
-pip install .[bigdl-cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html
+pip install .[bigdl-cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 ```
 
 ## Configure Serving Parameters
diff --git a/docs/setup.md b/docs/setup.md
index 5748d9785..82ed07160 100644
--- a/docs/setup.md
+++ b/docs/setup.md
@@ -40,15 +40,15 @@ conda activate llm-on-ray
 ```
 For CPU:
 ```bash
-pip install .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html
+pip install .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 ```
 For GPU:
 ```bash
-pip install .[gpu] --extra-index-url https://developer.intel.com/ipex-whl-stable-xpu
+pip install .[gpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 If DeepSpeed is enabled or doing distributed finetuing, oneCCL and Intel MPI libraries should be dynamically linked in every node before Ray starts:
 ```bash
-source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)")/env/setvars.sh
+source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl; print(torch_ccl.cwd)")/env/setvars.sh
 ```
 
 For Gaudi:
@@ -68,7 +68,7 @@ docker build \
 After the image is built successfully, start a container:
 
 ```bash
-docker run -it --runtime=habana -v ./llm-on-ray:/root/llm-ray --name="llm-ray-habana-demo" llm-ray-habana:latest 
+docker run -it --runtime=habana -v ./llm-on-ray:/root/llm-ray --name="llm-ray-habana-demo" llm-ray-habana:latest
 ```
 
 #### 3. Launch Ray cluster
diff --git a/pyproject.toml b/pyproject.toml
index d43848a1f..98ae293b0 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -43,7 +43,7 @@ cpu = [
     "transformers>=4.35.0",
     "intel_extension_for_pytorch==2.1.0+cpu",
     "torch==2.1.0+cpu",
-    "oneccl_bind_pt==2.1.0"
+    "oneccl_bind_pt==2.1.0+cpu"
 ]
 
 gpu = [