Support for loading multiple LORA adapters at runtime #2661

Unanswered

naiveen asked this question in Q&A

naiveen
Feb 26, 2024

Hi, I have a base model and several LORA adapters trained on top of it. The base model will always be loaded and for each inference request I modify the model by applying an adapter. I want to optimize my model using TensorRT, is there a way to apply LORA adapters on the optimized TensorRT model?

I would appreciate any ideas on where I can start to work on this problem? Thank you.

Replies: 2 comments 1 reply

narendasan
Feb 26, 2024
Collaborator

I think this would likely need refit support. This is an example in native TensorRT https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates. There may need to be some APIs exposed in torch-trt to work OOB

1 reply

naiveen Mar 13, 2024
Author

Thanks for your help. I tried refitting my model and the time taken to refit the engine is more than the latency gain from using Tensorrt.

narendasan
Aug 22, 2024
Collaborator

For future notice there is now this API that helps make loras easier to use with Torch-TRT Models https://github.com/pytorch/TensorRT/blob/main/examples/dynamo/mutable_torchtrt_module_example.py

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment