Fix runtime errors reported when using long input sequence lengths with LoRA #339

vivekgoe · 2024-09-26T10:24:47Z

This PR has following fixes,

Increase size of indices tensors used to maintain multi-lora state information from max_num_batched_tokens to 3*max_num_batched_tokens. This increase is done to provide buffer for padding done in batch & sequence dimensions.
Move logic to remove padding from lora_logits from execute_model() back to Class LogitsProcessorWithLoRA, this is done to fix race condition caused by updating multi-lora state information directly.

michalkuligowski · 2024-09-26T10:58:23Z

vllm/worker/habana_model_runner.py

-                [0] * batch_size * seq_len,
-                [0] * batch_size * seq_len,
-            )
+                **dict(index_mapping=[0] * batch_size * seq_len,


how does introducing dict here affect memory footprint? since we dont have control on when its garbage collected and size grows with batch_size*seq_len

@michalkuligowski Nothing has changed from existing code, we are still populating the same dataclass (LoRAMapping) as before, **dict simply is to use key, value notation initialization of class members. Regarding memory footprint, this dataclass needs to be available all the time if LoRA is enabled, it is a must have for LoRA operation. Again nothing specific to Gaudi/HPU, this is the way LoRA feature is implemented in vLLM.

…th LoRA (HabanaAI#339) This PR has following fixes, - Increase size of indices tensors used to maintain multi-lora state information from max_num_batched_tokens to 3*max_num_batched_tokens. This increase is done to provide buffer for padding done in batch & sequence dimensions. - Move logic to remove padding from lora_logits from execute_model() back to Class LogitsProcessorWithLoRA, this is done to fix race condition caused by updating multi-lora state information directly. FIX HabanaAI#237

Fix issues reported with long sequence length inputs when using LoRA

d12a1f1

vivekgoe requested review from hlahkar, kzawora-intel and madamczykhabana September 26, 2024 10:25

michalkuligowski reviewed Sep 26, 2024

View reviewed changes

hlahkar approved these changes Sep 27, 2024

View reviewed changes

michalkuligowski merged commit c3577af into habana_main Sep 27, 2024
19 checks passed

michalkuligowski deleted the private/vgoel/lora_long_seq_fix branch September 27, 2024 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix runtime errors reported when using long input sequence lengths with LoRA #339

Fix runtime errors reported when using long input sequence lengths with LoRA #339

vivekgoe commented Sep 26, 2024

michalkuligowski Sep 26, 2024

vivekgoe Sep 26, 2024

Fix runtime errors reported when using long input sequence lengths with LoRA #339

Fix runtime errors reported when using long input sequence lengths with LoRA #339

Conversation

vivekgoe commented Sep 26, 2024

michalkuligowski Sep 26, 2024

Choose a reason for hiding this comment

vivekgoe Sep 26, 2024

Choose a reason for hiding this comment