Skip to content

Commit

Permalink
update readme docker section, fix quickstart title, remove chs figure (
Browse files Browse the repository at this point in the history
…#11044)

* update readme and fix quickstart title, remove chs figure

* update readme according to comment

* reorganize the docker guide structure
  • Loading branch information
shane-huang authored May 23, 2024
1 parent 797dbc4 commit 7ed270a
Show file tree
Hide file tree
Showing 5 changed files with 34 additions and 24 deletions.
24 changes: 18 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,21 +77,33 @@ See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.i
[^1]: Performance varies by use, configuration and other factors. `ipex-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.

## `ipex-llm` Quickstart
### Install `ipex-llm`
- [Windows GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html): installing `ipex-llm` on Windows with Intel GPU
- [Linux GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html): installing `ipex-llm` on Linux with Intel GPU
- [Docker](docker/llm): using `ipex-llm` dockers on Intel CPU and GPU
- *For more details, please refer to the [installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html)*

### Run `ipex-llm`
### Docker
- [GPU Inference in C++](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_cpp_xpu_quickstart.html): running `llama.cpp`, `ollama`, `OpenWebUI`, etc., with `ipex-llm` on Intel GPU
- [GPU Inference in Python](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_pytorch_inference_gpu.html#) : running HuggingFace `transformers`, `LangChain`, `LlamaIndex`, `ModelScope`, etc. with `ipex-llm` on Intel GPU
- [GPU Dev in Visual Studio Code](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.html): LLM development in python using `ipex-llm` on Intel GPU in VSCode
- [vLLM on GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/fastchat_docker_quickstart.html): serving with `ipex-llm` accelerated `vLLM` on Intel GPU
- [FastChat on GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/fastchat_docker_quickstart.html): serving with `ipex-llm` accelerated `FastChat`on Intel GPU

### Use
- [llama.cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html): running **llama.cpp** (*using C++ interface of `ipex-llm` as an accelerated backend for `llama.cpp`*) on Intel GPU
- [ollama](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html): running **ollama** (*using C++ interface of `ipex-llm` as an accelerated backend for `ollama`*) on Intel GPU
- [vLLM](python/llm/example/GPU/vLLM-Serving): running `ipex-llm` in `vLLM` on both Intel [GPU](python/llm/example/GPU/vLLM-Serving) and [CPU](python/llm/example/CPU/vLLM-Serving)
- [FastChat](python/llm/src/ipex_llm/serving/fastchat): running `ipex-llm` in `FastChat` serving on on both Intel GPU and CPU
- [LangChain-Chatchat RAG](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html): running `ipex-llm` in `LangChain-Chatchat` (*Knowledge Base QA using **RAG** pipeline*)
- [Text-Generation-WebUI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html): running `ipex-llm` in `oobabooga` **WebUI**
- [Dify](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/dify_quickstart.html): running `ipex-llm` in `Dify`(*production-ready LLM app development platform*)
- [Continue](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/continue_quickstart.html): using `Continue` (a coding copilot in VSCode) backed by `ipex-llm`
- [Benchmarking](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/benchmark_quickstart.html): running (latency and throughput) benchmarks for `ipex-llm` on Intel CPU and GPU


### Install
- [Windows GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html): installing `ipex-llm` on Windows with Intel GPU
- [Linux GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html): installing `ipex-llm` on Linux with Intel GPU
- *For more details, please refer to the [installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html)*



### Code Examples
- Low bit inference
- [INT4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/Model): **INT4** LLM inference on Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model)
Expand Down
12 changes: 6 additions & 6 deletions docs/readthedocs/source/_templates/sidebar_quicklinks.html
Original file line number Diff line number Diff line change
Expand Up @@ -78,22 +78,22 @@
</label>
<ul class="bigdl-quicklinks-section-nav">
<li>
<a href="doc/LLM/DockerGuides/docker_windows_gpu.html">Overview of IPEX-LLM Containers for Intel GPU</a>
<a href="doc/LLM/DockerGuides/docker_windows_gpu.html">Overview of IPEX-LLM Containers</a>
</li>
<li>
<a href="doc/LLM/DockerGuides/docker_pytorch_inference_gpu.html">Run PyTorch Inference on an Intel GPU via Docker</a>
<a href="doc/LLM/DockerGuides/docker_pytorch_inference_gpu.html">Python Inference with `ipex-llm` on Intel GPU </a>
</li>
<li>
<a href="doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.html">Run/Develop PyTorch in VSCode with Docker on Intel GPU</a>
<a href="doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.html">VSCode LLM Development with `ipex-llm` on Intel GPU</a>
</li>
<li>
<a href="doc/LLM/DockerGuides/docker_cpp_xpu_quickstart.html">Run llama.cpp/Ollama/Open-WebUI on an Intel GPU via Docker</a>
<a href="doc/LLM/DockerGuides/docker_cpp_xpu_quickstart.html">llama.cpp/Ollama/Open-WebUI with `ipex-llm` on Intel GPU</a>
</li>
<li>
<a href="doc/LLM/DockerGuides/fastchat_docker_quickstart.html">Run IPEX-LLM integrated FastChat on an Intel GPU via Docker</a>
<a href="doc/LLM/DockerGuides/fastchat_docker_quickstart.html">FastChat with `ipex-llm` on Intel GPU</a>
</li>
<li>
<a href="doc/LLM/DockerGuides/vllm_docker_quickstart.html">Run IPEX-LLM integrated vLLM on an Intel GPU via Docker</a>
<a href="doc/LLM/DockerGuides/vllm_docker_quickstart.html">vLLM with `ipex-llm` on Intel GPU</a>
</li>
</ul>
</li>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Run PyTorch Inference on an Intel GPU via Docker
# Python Inference using IPEX-LLM on Intel GPU

We can run PyTorch Inference Benchmark, Chat Service and PyTorch Examples on Intel GPUs within Docker (on Linux or WSL).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,4 @@ Command: python chat.py --model-path /llm/llm-models/chatglm2-6b/
Uptime: 29.349235 s
Aborted
```
To resolve this problem, you can disabling the iGPU in Device Manager on Windows as follows:

<a href="https://llm-assets.readthedocs.io/en/latest/_images/disable_iGPU.png">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/disable_iGPU.png" width=100%; />
</a>
To resolve this problem, you can disable the iGPU in Device Manager on Windows. For details, refer to [this guide](https://www.elevenforum.com/t/enable-or-disable-integrated-graphics-igpu-in-windows-11.18616/)
14 changes: 8 additions & 6 deletions docs/readthedocs/source/doc/LLM/DockerGuides/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@ IPEX-LLM Docker Container User Guides

In this section, you will find guides related to using IPEX-LLM with Docker, covering how to:

* `Overview of IPEX-LLM Containers <./docker_windows_gpu.html>`_

* `Overview of IPEX-LLM Containers for Intel GPU <./docker_windows_gpu.html>`_
* `Run PyTorch Inference on an Intel GPU via Docker <./docker_pytorch_inference_gpu.html>`_
* `Run/Develop PyTorch in VSCode with Docker on Intel GPU <./docker_pytorch_inference_gpu.html>`_
* `Run llama.cpp/Ollama/open-webui with Docker on Intel GPU <./docker_cpp_xpu_quickstart.html>`_
* `Run IPEX-LLM integrated FastChat with Docker on Intel GPU <./fastchat_docker_quickstart.html>`_
* `Run IPEX-LLM integrated vLLM with Docker on Intel GPU <./vllm_docker_quickstart.html>`_
* Inference in Python/C++
* `GPU Inference in Python with IPEX-LLM <./docker_pytorch_inference_gpu.html>`_
* `VSCode LLM Development with IPEX-LLM on Intel GPU <./docker_pytorch_inference_gpu.html>`_
* `llama.cpp/Ollama/Open-WebUI with IPEX-LLM on Intel GPU <./docker_cpp_xpu_quickstart.html>`_
* Serving
* `FastChat with IPEX-LLM on Intel GPU <./fastchat_docker_quickstart.html>`_
* `vLLM with IPEX-LLM on Intel GPU <./vllm_docker_quickstart.html>`_

0 comments on commit 7ed270a

Please sign in to comment.