Add LoRA support for Mixtral #2831

tterrysun · 2024-02-10T03:39:21Z

Problem: We don't have LoRA support for Mixtral.

Solution: Add LoRA configurations for Mixtral and refactor relevant parts to allow this.

Testing: added correctness tests and updated existing tests.

pcmoritz · 2024-02-10T18:43:37Z

vllm/worker/worker.py

@@ -94,6 +95,24 @@ def init_model(self) -> None:
        # Initialize the model.
        set_random_seed(self.model_config.seed)

+        self.model = get_model(self.model_config, self.device_config,


This block of code is in https://github.com/vllm-project/vllm/blob/main/vllm/worker/model_runner.py#L84, it shouldn't be here.

pcmoritz · 2024-02-12T19:07:17Z

vllm/model_executor/models/mixtral.py

    ) -> None:
        super().__init__()
        self.config = config
        self.linear_method = linear_method
        self.model = MixtralModel(config, linear_method)
-        self.lm_head = ParallelLMHead(config.vocab_size, config.hidden_size)
-        self.sampler = Sampler(config.vocab_size)
+        self.unpadded_vocab_size = config.vocab_size


Can you say why this is different from the code in mistral.py? There is the minor cosmetic difference of unpadded_vocab_size vs. self.unpadded_vocab_size that we should fix, but also the larger difference of the padding_size difference in the ParallelLMHead as well as the different parameters of the Sampler. Is there a reason why the code is not the same? :)

Same comment comparing with llama.py

should be the same, fixed

pcmoritz · 2024-02-12T19:37:25Z

vllm/model_executor/models/llama.py

@@ -270,6 +270,30 @@ def forward(

 class LlamaForCausalLM(nn.Module):
    supports_lora = True
+    lora_target_modules = [


The biggest question I have about this PR is these configurations, I don't think they should be here. There might be different ways to run a model (e.g. different layers get LoRAified, like for the mixtral model, do you apply LoRA to the MoE weight matrices or not?), and also it seems odd to have the lora configurations in the model itself. Are there better places where this can fit? Maybe @Yard1 has some ideas here as well :)

The other issue of the override code in vllm/lora/models.py being a little too complex will flow naturally from what we do here.

Here are my thoughts so far: The lora_target_modules I believe is in the adapter_config.json of the LoRA model file, so we probably can use the information from there -- it just will need to be adapted a little bit via the packed_module_mapping. So it seems we can remove the LoRA specific stuff, which is already a big relief.

The other parameters are also useful for loading the model etc, so maybe we should keep them here? It would be worth investigating if we can reuse this in the model loading function for the base model (load_weights).

This defines all the layers we support for LoRA on the given model. The adapter can use a subset of those layers. We should not be using adapter_config.json directly because:

different adapters will have different layers, therefore we need a common superset

that superset has to be constant

we need to implement support for each layer type, so if an adapter specifies a layer we do not support, we need to throw exceptions.

Given the above, defining those variables as attributes of the model class seems to be the best option.

Sounds great, I believe my confusion will go away if we rename lora_target_modules to supported_lora_modules so it is clearer that this is the superset of modules that we support, and not the actual ones that are loaded. Let's rename it, and in parallel I'll make a PR that shows how it would look like if we use packed_module_mapping in the load_weights function so the information is in only one place.

Let's also reorder things so the generic attributes come before the lora specific ones (and add a comment on the lora specific ones that they are lora specific).

I put up a PR in #2843 :)

pcmoritz · 2024-02-13T03:35:39Z

tests/lora/test_lora_manager.py

@@ -120,7 +129,7 @@ def test_lora_model_manager(dist_init, dummy_model):
        2,
        2,
        LoRAConfig(max_lora_rank=8, max_cpu_loras=3, max_loras=2),
-        lora_target_modules=["dense1", "dense2", "lm_head"])
+        support_lora_modules=["dense1", "dense2", "lm_head"])


Can you rename this to supported_lora_modules? The idea here is this is a list of modules that can be LoRAified (i.e. the list of modules for which the model supports applying LoRA adapters).

pcmoritz · 2024-02-13T03:48:13Z

vllm/lora/models.py

+        # allow overriding the target modules and mapping with initialization
+        if support_lora_modules:
+            self.support_lora_modules: List[str] = (
+                [support_lora_modules] if isinstance(


We don't need the isinstance part any more now that only lists are supported, right?

pcmoritz · 2024-02-13T22:25:02Z

vllm/lora/models.py

        lora_manager_cls: Type[LoRAModelManager] = LoRAModelManager,
        **kwargs) -> LoRAModelManager:
    """Create a LoRA adapter for a given model."""
-    if not getattr(model, "supports_lora", False):
+    if not getattr(model, "supported_lora_modules", False):


Nit: This is slightly odd, if not hasattr(model, "supported_lora_modules") would is more natural here, since supported_lora_modules is not of type bool :)

pcmoritz · 2024-02-13T22:26:50Z

vllm/lora/worker_manager.py

@@ -195,11 +198,10 @@ class LRUCacheWorkerLoRAManager(WorkerLoRAManager):
    def create_lora_manager(
        self,
        model: torch.nn.Module,
-        target_modules: Union[str, List[str]] = TARGET_MODULES_QKV,
+        supported_lora_modules: Optional[Union[str, List[str]]] = None,


supported_lora_modules should be removed, right?

pcmoritz · 2024-02-13T22:27:12Z

vllm/model_executor/model_loader.py

@@ -66,7 +66,7 @@ def get_model(model_config: ModelConfig,
        # Create a model instance.
        # The weights will be initialized as empty tensors.
        with torch.device(device_config.device):
-            if getattr(model_class, "supports_lora", False):
+            if getattr(model_class, "supported_lora_modules", False):


same comment as above with hasattr

pcmoritz · 2024-02-13T22:29:00Z

vllm/worker/model_runner.py

+            assert hasattr(
+                self.model, "supported_lora_modules"
+            ) and self.model.supported_lora_modules, "Model does not support LoRA"
+            assert hasattr(self.model, "embedding_modules") and hasattr(


Nit: Can you split this into two asserts? That will make the error message a little clearer if one of them is missing (for debugging)

pcmoritz

The code looks great to me now, thanks for doing this :)

pcmoritz · 2024-02-13T23:31:22Z

I also did some manual testing on this PR: Merge in #2775, and then run

export LORA_PATH=/home/ray/mixtral_lora_checkpoint
python -m vllm.entrypoints.openai.api_server  --model mistralai/Mixtral-8x7B-Instruct-v0.1  --enable-lora  --lora-modules mixtral_lora=$LORA_PATH --tensor-parallel-size 4

and query the server on some workload I care about and it is working well!

Yard1

LGTM, thanks!

* add mixtral lora support * formatting * fix incorrectly ported logic * polish tests * minor fixes and refactoring * minor fixes * formatting * rename and remove redundant logic * refactoring * refactoring * minor fix * minor refactoring * fix code smell

sfc-gh-ybsat · 2024-05-21T03:21:57Z

@tterrysun @Yard1 seems like the Mixtral implementation does not support the expert linear layers: w1, w2, w3.
What would it take to add such support?
I tried naively adding the list to support_lora_modules in model_executor/models/mixtral.py but that obviously didnt work.
Would we need to make some punica kernel changes for this to work? Or what logic suffices to be updated?
Thanks in advance

* add mixtral lora support * formatting * fix incorrectly ported logic * polish tests * minor fixes and refactoring * minor fixes * formatting * rename and remove redundant logic * refactoring * refactoring * minor fix * minor refactoring * fix code smell

add mixtral lora support

604db20

tterrysun changed the title ~~Add LoRA support for Mitral~~ Add LoRA support for Mixtral Feb 10, 2024

formatting

ad27218

pcmoritz reviewed Feb 10, 2024

View reviewed changes

tterrysun added 2 commits February 10, 2024 23:08

fix incorrectly ported logic

91f33a3

polish tests

93bed9d

tterrysun requested a review from pcmoritz February 12, 2024 16:55

tterrysun marked this pull request as ready for review February 12, 2024 16:56

pcmoritz reviewed Feb 12, 2024

View reviewed changes

pcmoritz mentioned this pull request Feb 13, 2024

Explicit packed params in preparation for more LoRA support #2843

Open

tterrysun added 3 commits February 12, 2024 17:36

minor fixes and refactoring

005df92

minor fixes

70ca5a2

formatting

6cee437

pcmoritz reviewed Feb 13, 2024

View reviewed changes

tterrysun added 6 commits February 13, 2024 09:01

rename and remove redundant logic

80b6346

Merge branch 'main' into terry/mixtral_lora

90e74ae

Merge branch 'main' into terry/mixtral_lora

c9f2b24

refactoring

9e6495c

refactoring

cb4a69e

minor fix

29be2e5

tterrysun requested review from pcmoritz and Yard1 February 13, 2024 22:20

pcmoritz reviewed Feb 13, 2024

View reviewed changes

tterrysun added 2 commits February 13, 2024 14:35

minor refactoring

56f9946

fix code smell

6bf3ba8

pcmoritz approved these changes Feb 13, 2024

View reviewed changes

Yard1 approved these changes Feb 13, 2024

View reviewed changes

Yard1 merged commit 2a543d6 into vllm-project:main Feb 13, 2024
19 checks passed

Pernekhan mentioned this pull request Feb 14, 2024

Fix AttributeError: MixtralModel object has no attribute org_vocab_size. #2875

Closed

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LoRA support for Mixtral #2831

Add LoRA support for Mixtral #2831

tterrysun commented Feb 10, 2024

pcmoritz Feb 10, 2024 •

edited

Loading

pcmoritz Feb 12, 2024 •

edited

Loading

tterrysun Feb 13, 2024

pcmoritz Feb 12, 2024 •

edited

Loading

pcmoritz Feb 12, 2024

Yard1 Feb 12, 2024 •

edited

Loading

pcmoritz Feb 12, 2024 •

edited

Loading

pcmoritz Feb 13, 2024

pcmoritz Feb 13, 2024 •

edited

Loading

pcmoritz Feb 13, 2024

pcmoritz Feb 13, 2024

pcmoritz Feb 13, 2024

pcmoritz Feb 13, 2024

pcmoritz Feb 13, 2024

pcmoritz left a comment

pcmoritz commented Feb 13, 2024

Yard1 left a comment

sfc-gh-ybsat commented May 21, 2024

Add LoRA support for Mixtral #2831

Add LoRA support for Mixtral #2831

Conversation

tterrysun commented Feb 10, 2024

pcmoritz Feb 10, 2024 • edited Loading

Choose a reason for hiding this comment

pcmoritz Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcmoritz Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yard1 Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

pcmoritz Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcmoritz Feb 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcmoritz left a comment

Choose a reason for hiding this comment

pcmoritz commented Feb 13, 2024

Yard1 left a comment

Choose a reason for hiding this comment

sfc-gh-ybsat commented May 21, 2024

pcmoritz Feb 10, 2024 •

edited

Loading

pcmoritz Feb 12, 2024 •

edited

Loading

pcmoritz Feb 12, 2024 •

edited

Loading

Yard1 Feb 12, 2024 •

edited

Loading

pcmoritz Feb 12, 2024 •

edited

Loading

pcmoritz Feb 13, 2024 •

edited

Loading