Skip to content

Commit

Permalink
multi-lora documentation fix (vllm-project#3064)
Browse files Browse the repository at this point in the history
  • Loading branch information
ElefHead authored Feb 28, 2024
1 parent 3b847e9 commit 1eb784b
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion docs/source/models/lora.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ LoRA adapted models can also be served with the Open-AI compatible vLLM server.

.. code-block:: bash
python -m vllm.entrypoints.api_server \
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
Expand Down Expand Up @@ -89,3 +89,15 @@ with its base model:
Requests can specify the LoRA adapter as if it were any other model via the ``model`` request parameter. The requests will be
processed according to the server-wide LoRA configuration (i.e. in parallel with base model requests, and potentially other
LoRA adapter requests if they were provided and ``max_loras`` is set high enough).

The following is an example request

.. code-block::bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "sql-lora",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
}' | jq

0 comments on commit 1eb784b

Please sign in to comment.