Multi-LoRA - Support for providing /load and /unload API #3308

gauravkr2108 · 2024-03-11T04:27:21Z

Problem statement:

In the production system, there should be an API to add\\remove fine-tuned weights dynamically. Inference caller should not have to specify LoRA location with each call.

Current Multi-LoRA support allows adaptor load during inference calls, which doesn't check if finetune weights are already loaded and ready for inferencing.

Proposal:

Introduce an API - /load and /unload to allow fine-tuned weights inclusions in vllm.

POST /load -> add finetunes weight as part of models.
POST /unload -> remove finetunes weight from models list.

This will allow the set of finetuned weights present in vllm server.

This will infer no need to specify finetune weight names, and locations as part of each inference request.

Sample code:

lora_request = None
index = 1
 
 
@app.post("/load")
async def load(request: Request) -> Response:
    request_dict = await request.json()
    global lora_request
 
    lora_local_path = request_dict.pop("lora_path", "/models/lora/")
    global index
    lora_request = LoRARequest(
        lora_name=lora_local_path,
        lora_int_id=index,
        lora_local_path=lora_local_path)
 
    index = index + 1
    return Response(status_code=201)
 
@app.post("/unload")
async def unload(request: Request) -> Response:
    """
    Unload API
    :param request:
    :return:
    """
    global lora_request
    lora_request = None
 
    global index
    if not index <= 1:
        index = index - 1
 
    return Response(status_code=201)

The text was updated successfully, but these errors were encountered:

simon-mo · 2024-03-12T20:57:06Z

I'm open to this because I anticipate this will be helpful for production use cases. PRs are welcomed with changes to OpenAI API server with route starting with /-/ indicating private API. Such as PUT /-/lora_cache and DELETE /-/lora_cache.

gauravkr2108 · 2024-03-13T16:43:05Z

let me draft a PR

simon-mo · 2024-03-16T21:39:35Z

I have seen #3446 pop up

gauravkr2108 · 2024-03-19T11:26:33Z

@simon-mo add for add and delete operation; I can work with #3446 to add delete operation.

gauravkr2108 · 2024-03-19T12:03:59Z

@simon-mo there is an Open AI API to delete fine-tuned model https://platform.openai.com/docs/api-reference/models/delete ; we adopt this API?

thincal · 2024-03-23T16:24:03Z

So that if I want to inference an LoRA model, needs to invoke the /load firstly then inference with lora model? if there are many vLLM engine instance deployed, the requests is load-balanced to selected vLLM instance, how does this two step design work to make sure inference happened in an instance with the lora just loaded already?

lizzzcai · 2024-06-14T03:25:26Z

created one feature request but for multi models, I think the endpoint can be reused.

DarkLight1337 · 2024-09-06T04:39:36Z

Closing as completed by #6566

gauravkr2108 mentioned this issue Mar 19, 2024

Dynamic Multi LoRA Load \ Delete Support #3496

Open

jacobthebanana mentioned this issue Apr 14, 2024

[Feature]: Allow LoRA adapters to be specified as in-memory dict of tensors #4068

Open

simon-mo mentioned this issue Apr 18, 2024

[Frontend] support new lora module to a live server in OpenAI Entrypoints #3446

Open

Jeffwan mentioned this issue Jul 10, 2024

[RFC]: Enhancing LoRA Management for Production Environments in vLLM #6275

Open

Jeffwan mentioned this issue Jul 24, 2024

[Core] Support load and unload LoRA in api server #6566

Merged

DarkLight1337 closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-LoRA - Support for providing /load and /unload API #3308

Multi-LoRA - Support for providing /load and /unload API #3308

gauravkr2108 commented Mar 11, 2024 •

edited by simon-mo

Loading

simon-mo commented Mar 12, 2024

gauravkr2108 commented Mar 13, 2024

simon-mo commented Mar 16, 2024

gauravkr2108 commented Mar 19, 2024

gauravkr2108 commented Mar 19, 2024

thincal commented Mar 23, 2024 •

edited

Loading

lizzzcai commented Jun 14, 2024

DarkLight1337 commented Sep 6, 2024

Multi-LoRA - Support for providing /load and /unload API #3308

Multi-LoRA - Support for providing /load and /unload API #3308

Comments

gauravkr2108 commented Mar 11, 2024 • edited by simon-mo Loading

Problem statement:

Proposal:

simon-mo commented Mar 12, 2024

gauravkr2108 commented Mar 13, 2024

simon-mo commented Mar 16, 2024

gauravkr2108 commented Mar 19, 2024

gauravkr2108 commented Mar 19, 2024

thincal commented Mar 23, 2024 • edited Loading

lizzzcai commented Jun 14, 2024

DarkLight1337 commented Sep 6, 2024

gauravkr2108 commented Mar 11, 2024 •

edited by simon-mo

Loading

thincal commented Mar 23, 2024 •

edited

Loading