Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix package import path #106

Merged
merged 37 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
3de6016
mv path
KepingYan Feb 8, 2024
4c95129
modify import path
KepingYan Feb 8, 2024
19d7fc2
modify package name
KepingYan Feb 8, 2024
c1394d4
update path
KepingYan Feb 20, 2024
ad9a55a
update
KepingYan Feb 20, 2024
4724b44
disable mpt-7b-bigdl
KepingYan Feb 21, 2024
3024036
update
KepingYan Feb 21, 2024
188e495
update for ui
KepingYan Feb 21, 2024
4a63f34
modify llmonray to llm_on_ray
KepingYan Feb 23, 2024
ac3cb59
simply execution command
KepingYan Feb 23, 2024
c70c0ff
merge main branch
KepingYan Feb 23, 2024
0d36d0e
test
KepingYan Feb 23, 2024
961c176
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Feb 23, 2024
f83ee7d
test
KepingYan Feb 23, 2024
1916e20
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Feb 23, 2024
0219eeb
modify
KepingYan Feb 23, 2024
32e990e
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Feb 26, 2024
77055bd
fix
KepingYan Feb 26, 2024
91a8429
update & disable vllm tempeorary
KepingYan Feb 26, 2024
ce2f019
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Feb 26, 2024
f8c59d3
test
KepingYan Feb 27, 2024
a6f1db6
test
KepingYan Feb 27, 2024
61731b9
test
KepingYan Feb 27, 2024
c50ffea
recover
KepingYan Feb 27, 2024
883c9eb
update
KepingYan Feb 27, 2024
6a07499
fix vllm
KepingYan Feb 28, 2024
4a16df0
update
KepingYan Feb 29, 2024
fd2e56e
merge main branch
KepingYan Feb 29, 2024
43af195
move mllm path
KepingYan Feb 29, 2024
93c4918
modify
KepingYan Mar 5, 2024
5933105
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Mar 5, 2024
e51b244
fix err
KepingYan Mar 5, 2024
e055d98
remove import_all_modules
KepingYan Mar 6, 2024
3b0a89b
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Mar 6, 2024
af9a299
Update .github/workflows/workflow_finetune.yml
xwu99 Mar 7, 2024
a2e6c57
add comment
KepingYan Mar 7, 2024
c741c01
add comment
KepingYan Mar 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .github/workflows/workflow_finetune.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ jobs:
docker exec "finetune" bash -c "source \$(python -c 'import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)')/env/setvars.sh; RAY_SERVE_ENABLE_EXPERIMENTAL_STREAMING=1 ray start --head --node-ip-address 127.0.0.1 --ray-debugger-external; RAY_SERVE_ENABLE_EXPERIMENTAL_STREAMING=1 ray start --address='127.0.0.1:6379' --ray-debugger-external"
CMD=$(cat << EOF
import yaml
conf_path = "finetune/finetune.yaml"
conf_path = "llm_on_ray/finetune/finetune.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['General']['base_model'] = "${{ matrix.model }}"
Expand Down Expand Up @@ -113,14 +113,14 @@ jobs:
EOF
)
docker exec "finetune" python -c "$CMD"
docker exec "finetune" bash -c "python finetune/finetune.py --config_file finetune/finetune.yaml"
docker exec "finetune" bash -c "llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml"
xwu99 marked this conversation as resolved.
Show resolved Hide resolved

- name: Run PEFT-LoRA Test
run: |
docker exec "finetune" bash -c "rm -rf /tmp/llm-ray/*"
CMD=$(cat << EOF
import yaml
conf_path = "finetune/finetune.yaml"
conf_path = "llm_on_ray/finetune/finetune.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['General']['lora_config'] = {
Expand All @@ -138,7 +138,7 @@ jobs:
EOF
)
docker exec "finetune" python -c "$CMD"
docker exec "finetune" bash -c "python finetune/finetune.py --config_file finetune/finetune.yaml"
docker exec "finetune" bash -c "llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml"

- name: Run Deltatuner Test on DENAS-LoRA Model
run: |
Expand All @@ -150,7 +150,7 @@ jobs:
import os
import yaml
os.system("cp -r $(python -m pip show deltatuner | grep Location | cut -d: -f2)/deltatuner/conf/best_structure examples/")
conf_path = "finetune/finetune.yaml"
conf_path = "llm_on_ray/finetune/finetune.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['General']['lora_config'] = {
Expand All @@ -168,7 +168,7 @@ jobs:
yaml.dump(result, output, sort_keys=False)
EOF)
docker exec "finetune" python -c "$CMD"
docker exec "finetune" bash -c "python finetune/finetune.py --config_file finetune/finetune.yaml"
docker exec "finetune" bash -c "llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml"
fi

- name: Stop Ray
Expand Down
22 changes: 11 additions & 11 deletions .github/workflows/workflow_inference.yml
KepingYan marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -118,14 +118,14 @@ jobs:
CMD=$(cat << EOF
import yaml
if ("${{ matrix.model }}" == "starcoder"):
conf_path = "inference/models/starcoder.yaml"
conf_path = "llm_on_ray/inference/models/starcoder.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['model_description']["config"]["use_auth_token"] = "${{ env.HF_ACCESS_TOKEN }}"
with open(conf_path, 'w') as output:
yaml.dump(result, output, sort_keys=False)
if ("${{ matrix.model }}" == "llama-2-7b-chat-hf"):
conf_path = "inference/models/llama-2-7b-chat-hf.yaml"
conf_path = "llm_on_ray/inference/models/llama-2-7b-chat-hf.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['model_description']["config"]["use_auth_token"] = "${{ env.HF_ACCESS_TOKEN }}"
Expand All @@ -135,11 +135,11 @@ jobs:
)
docker exec "${TARGET}" python -c "$CMD"
if [[ ${{ matrix.model }} == "mpt-7b-bigdl" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file inference/models/bigdl/mpt-7b-bigdl.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file llm_on_ray/inference/models/bigdl/mpt-7b-bigdl.yaml --simple"
elif [[ ${{ matrix.model }} == "llama-2-7b-chat-hf-vllm" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file .github/workflows/config/llama-2-7b-chat-hf-vllm-fp32.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file .github/workflows/config/llama-2-7b-chat-hf-vllm-fp32.yaml --simple"
else
docker exec "${TARGET}" bash -c "python inference/serve.py --simple --models ${{ matrix.model }}"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --simple --models ${{ matrix.model }}"
fi
echo Non-streaming query:
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
Expand All @@ -150,7 +150,7 @@ jobs:
if: ${{ matrix.dtuner_model }}
run: |
TARGET=${{steps.target.outputs.target}}
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file .github/workflows/config/mpt_deltatuner.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file .github/workflows/config/mpt_deltatuner.yaml --simple"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }} --streaming_response"

Expand All @@ -160,8 +160,8 @@ jobs:
if [[ ${{ matrix.model }} =~ ^(gpt2|falcon-7b|starcoder|mpt-7b.*)$ ]]; then
echo ${{ matrix.model }} is not supported!
elif [[ ! ${{ matrix.model }} == "llama-2-7b-chat-hf-vllm" ]]; then
docker exec "${TARGET}" bash -c "python .github/workflows/config/update_inference_config.py --config_file inference/models/\"${{ matrix.model }}\".yaml --output_file \"${{ matrix.model }}\".yaml.deepspeed --deepspeed"
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file \"${{ matrix.model }}\".yaml.deepspeed --simple"
docker exec "${TARGET}" bash -c "python .github/workflows/config/update_inference_config.py --config_file llm_on_ray/inference/models/\"${{ matrix.model }}\".yaml --output_file \"${{ matrix.model }}\".yaml.deepspeed --deepspeed"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file \"${{ matrix.model }}\".yaml.deepspeed --simple"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }} --streaming_response"
fi
Expand All @@ -173,7 +173,7 @@ jobs:
if [[ ${{ matrix.model }} =~ ^(gpt2|falcon-7b|starcoder|mpt-7b.*)$ ]]; then
echo ${{ matrix.model }} is not supported!
else
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file .github/workflows/config/mpt_deltatuner_deepspeed.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file .github/workflows/config/mpt_deltatuner_deepspeed.yaml --simple"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }} --streaming_response"
fi
Expand All @@ -182,9 +182,9 @@ jobs:
run: |
TARGET=${{steps.target.outputs.target}}
if [[ ${{ matrix.model }} == "mpt-7b-bigdl" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file inference/models/bigdl/mpt-7b-bigdl.yaml"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file llm_on_ray/inference/models/bigdl/mpt-7b-bigdl.yaml"
elif [[ ! ${{ matrix.model }} == "llama-2-7b-chat-hf-vllm" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --models ${{ matrix.model }}"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --models ${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_openai/query_http_requests.py --model_name ${{ matrix.model }}"
fi

Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/workflow_orders_on_merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ on:
paths:
- '.github/**'
- 'docker/**'
- 'common/**'
- 'dev/docker/**'
- 'finetune/**'
- 'inference/**'
- 'rlhf/**'
- 'llm_on_ray/common/**'
- 'llm_on_ray/finetune/**'
- 'llm_on_ray/inference/**'
- 'llm_on_ray/rlhf/**'
- 'tools/**'
- 'pyproject.toml'
- 'tests/**'
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/workflow_orders_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ on:
paths:
- '.github/**'
- 'docker/**'
- 'common/**'
- 'dev/docker/**'
- 'finetune/**'
- 'inference/**'
- 'rlhf/**'
- 'llm_on_ray/common/**'
- 'llm_on_ray/finetune/**'
- 'llm_on_ray/inference/**'
- 'llm_on_ray/rlhf/**'
- 'tools/**'
- 'pyproject.toml'
- 'tests/**'
Expand Down
6 changes: 3 additions & 3 deletions README.md
KepingYan marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -62,14 +62,14 @@ ray start --head
Use the following command to finetune a model using an example dataset and default configurations. The finetuned model will be stored in `/tmp/llm-ray/output` by default. To customize the base model, dataset and configurations, please see the [finetuning document](#finetune):

```bash
python finetune/finetune.py --config_file finetune/finetune.yaml
llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml
```

### Serving
Deploy a model on Ray and expose an endpoint for serving. This command uses GPT2 as an example, but more model configuration examples can be found in the [inference/models](inference/models) directory:

```bash
python inference/serve.py --config_file inference/models/gpt2.yaml
llm_on_ray-serve --config_file llm_on_ray/inference/models/gpt2.yaml
```

The default served method is to provide an OpenAI-compatible API server ([OpenAI API Reference](https://platform.openai.com/docs/api-reference/chat)), you can access and test it in many ways:
Expand All @@ -95,7 +95,7 @@ python examples/inference/api_server_openai/query_openai_sdk.py
```
Or you can serve specific model to a simple endpoint according to the `port` and `route_prefix` parameters in configuration file,
```bash
python inference/serve.py --config_file inference/models/gpt2.yaml --simple
llm_on_ray-serve --config_file llm_on_ray/inference/models/gpt2.yaml --simple
```
After deploying the model endpoint, you can access and test it by using the script below:
```bash
Expand Down
9 changes: 0 additions & 9 deletions common/agentenv/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/dataprocesser/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/dataset/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/initializer/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/model/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/optimizer/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/tokenizer/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/trainer/__init__.py

This file was deleted.

2 changes: 1 addition & 1 deletion dev/docker/Dockerfile.bigdl-cpu
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
COPY ./pyproject.toml .
COPY ./MANIFEST.in .

RUN mkdir ./finetune && mkdir ./inference
RUN mkdir ./llm_on_ray
KepingYan marked this conversation as resolved.
Show resolved Hide resolved

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Expand Down
2 changes: 1 addition & 1 deletion dev/docker/Dockerfile.cpu_and_deepspeed
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
COPY ./pyproject.toml .
COPY ./MANIFEST.in .

RUN mkdir ./finetune && mkdir ./inference
RUN mkdir ./llm_on_ray

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Expand Down
4 changes: 2 additions & 2 deletions dev/docker/Dockerfile.vllm
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
unset -f conda && \
export PATH=$CONDA_DIR/bin/:${PATH} && \
conda config --add channels intel && \
conda install -y -c conda-forge python==3.9 gxx=12.3 gxx_linux-64=12.3
conda install -y -c conda-forge python==3.9 gxx=12.3 gxx_linux-64=12.3 libxcrypt

COPY ./pyproject.toml .
COPY ./MANIFEST.in .
COPY ./dev/scripts/install-vllm-cpu.sh .

RUN mkdir ./finetune && mkdir ./inference
RUN mkdir ./llm_on_ray

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Expand Down
2 changes: 1 addition & 1 deletion docs/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,5 +65,5 @@ The following models have been verified on Intel CPUs or GPUs.
## Finetune the model
To finetune your model, execute the following command. The finetuned model will be saved in /tmp/llm-ray/output by default.
``` bash
python finetune/finetune.py --config_file <your finetuning conf file>
llm_on_ray-finetune --config_file <your finetuning conf file>
```
10 changes: 5 additions & 5 deletions docs/pretrain.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,27 +123,27 @@ Set up `megatron_deepspeed_path` in the configuration.
```bash
cd /home/user/workspace/llm-on-ray
#Bloom-7B
KepingYan marked this conversation as resolved.
Show resolved Hide resolved
python pretrain/megatron_deepspeed_pretrain.py --config_file pretrain/config/bloom_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
llm_on_ray-megatron_deepspeed_pretrain --config_file llm_on_ray/pretrain/config/bloom_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
#llama-7B
python pretrain/megatron_deepspeed_pretrain.py --config_file pretrain/config/llama_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
llm_on_ray-megatron_deepspeed_pretrain --config_file llm_on_ray/pretrain/config/llama_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
```

##### Huggingface Trainer
```bash
cd /home/user/workspace/llm-on-ray
#llama-7B
python pretrain/pretrain.py --config_file pretrain/config/llama_7b_8Guadi_pretrain.conf
llm_on_ray-pretrain --config_file llm_on_ray/pretrain/config/llama_7b_8Guadi_pretrain.conf
```
##### Nvidia GPU:
###### Megatron-DeepSpeed
```bash
cd /home/user/workspace/llm-on-ray
#llama2-7B
python pretrain/megatron_deepspeed_pretrain.py --config_file pretrain/config/llama2_3b_megatron_deepspeed_zs0_8gpus_pretrain.conf
llm_on_ray-megatron_deepspeed_pretrain --config_file llm_on_ray/pretrain/config/llama2_3b_megatron_deepspeed_zs0_8gpus_pretrain.conf
```
##### Huggingface Trainer
```bash
cd /home/user/workspace/llm-on-ray
#llama-7B
python pretrain/pretrain.py --config_file pretrain/config/llama_7b_8gpu_pretrain.conf
llm_on_ray-pretrain --config_file llm_on_ray/pretrain/config/llama_7b_8gpu_pretrain.conf
```
12 changes: 6 additions & 6 deletions docs/serve.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,22 @@ LLM-on-Ray also supports serving with [Deepspeed](serve_deepspeed.md) for AutoTP
We support three methods to specify the models to be served, and they have the following priorities.
1. Use inference configuration file if config_file is set.
```
python inference/serve.py --config_file inference/models/gpt2.yaml
llm_on_ray-serve --config_file llm_on_ray/inference/models/gpt2.yaml
```
2. Use relevant configuration parameters if model_id_or_path is set.
```
python inference/serve.py --model_id_or_path gpt2 [--tokenizer_id_or_path gpt2 --port 8000 --route_prefix ...]
llm_on_ray-serve --model_id_or_path gpt2 [--tokenizer_id_or_path gpt2 --port 8000 --route_prefix ...]
```
3. If --config_file and --model_id_or_path are both None, it will serve all pre-defined models in inference/models/*.yaml, or part of them if models is set.
```
python inference/serve.py --models gpt2 gpt-j-6b
llm_on_ray-serve --models gpt2 gpt-j-6b
```
### OpenAI-compatible API
To deploy your model, execute the following command with the model's configuration file. This will create an OpenAI-compatible API ([OpenAI API Reference](https://platform.openai.com/docs/api-reference/chat)) for serving.
```bash
python inference/serve.py --config_file <path to the conf file>
llm_on_ray-serve --config_file <path to the conf file>
```
To deploy and serve multiple models concurrently, place all models' configuration files under `inference/models` and directly run `python inference/serve.py` without passing any conf file.
To deploy and serve multiple models concurrently, place all models' configuration files under `llm_on_ray/inference/models` and directly run `llm_on_ray-serve` without passing any conf file.

After deploying the model, you can access and test it in many ways:
```bash
Expand All @@ -71,7 +71,7 @@ python examples/inference/api_server_openai/query_openai_sdk.py
### Serving Model to a Simple Endpoint
This will create a simple endpoint for serving according to the `port` and `route_prefix` parameters in conf file, for example: http://127.0.0.1:8000/gpt2.
```bash
python inference/serve.py --config_file <path to the conf file> --simple
llm_on_ray-serve --config_file <path to the conf file> --simple
```
After deploying the model endpoint, you can access and test it by using the script below:
```bash
Expand Down
2 changes: 1 addition & 1 deletion docs/vllm.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Please follow [Deploying and Serving LLMs on Intel CPU/GPU/Gaudi](serve.md) docu
To serve model with vLLM, run the following:

```bash
$ python serve.py --config_file inference/models/vllm/llama-2-7b-chat-hf-vllm.yaml --simple --keep_serve_terminal
$ llm_on_ray-serve --config_file llm_on_ray/inference/models/vllm/llama-2-7b-chat-hf-vllm.yaml --simple --keep_serve_terminal
KepingYan marked this conversation as resolved.
Show resolved Hide resolved
```

In the above example, `vllm` property is set to `true` in the config file for enabling vLLM.
Expand Down
2 changes: 1 addition & 1 deletion docs/web_ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ $ dev/scripts/install-ui.sh
## Start Web UI
KepingYan marked this conversation as resolved.
Show resolved Hide resolved

```bash
python -u ui/start_ui.py --node_user_name $user --conda_env_name $conda_env --master_ip_port "$node_ip:6379"
python -m llm_on_ray.ui.start_ui --node_user_name $user --conda_env_name $conda_env --master_ip_port "$node_ip:6379"
```
You will get URL from the command line output (E.g. http://0.0.0.0:8080 for local network and https://180cd5f7c31a1cfd3c.gradio.live for public network) and use the web browser to open it.

Expand Down
Loading
Loading