Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix runtime errors reported when using long input sequence lengths with LoRA #339

Merged
merged 1 commit into from
Sep 27, 2024

Conversation

vivekgoe
Copy link

This PR has following fixes,

  • Increase size of indices tensors used to maintain multi-lora state information from max_num_batched_tokens to 3*max_num_batched_tokens. This increase is done to provide buffer for padding done in batch & sequence dimensions.
  • Move logic to remove padding from lora_logits from execute_model() back to Class LogitsProcessorWithLoRA, this is done to fix race condition caused by updating multi-lora state information directly.

FIX #237

[0] * batch_size * seq_len,
[0] * batch_size * seq_len,
)
**dict(index_mapping=[0] * batch_size * seq_len,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does introducing dict here affect memory footprint? since we dont have control on when its garbage collected and size grows with batch_size*seq_len

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalkuligowski Nothing has changed from existing code, we are still populating the same dataclass (LoRAMapping) as before, **dict simply is to use key, value notation initialization of class members. Regarding memory footprint, this dataclass needs to be available all the time if LoRA is enabled, it is a must have for LoRA operation. Again nothing specific to Gaudi/HPU, this is the way LoRA feature is implemented in vLLM.

@michalkuligowski michalkuligowski merged commit c3577af into habana_main Sep 27, 2024
19 checks passed
@michalkuligowski michalkuligowski deleted the private/vgoel/lora_long_seq_fix branch September 27, 2024 06:58
huijjj pushed a commit to SqueezeBits/vllm-fork that referenced this pull request Sep 27, 2024
…th LoRA (HabanaAI#339)

This PR has following fixes,
- Increase size of indices tensors used to maintain multi-lora state
information from max_num_batched_tokens to 3*max_num_batched_tokens.
This increase is done to provide buffer for padding done in batch &
sequence dimensions.
- Move logic to remove padding from lora_logits from execute_model()
back to Class LogitsProcessorWithLoRA, this is done to fix race
condition caused by updating multi-lora state information directly.

FIX HabanaAI#237
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Batched Multi-LoRA inference failure with random length dataset
3 participants