[core][distributed] simplify code to support pipeline parallel #6406

youkaichao · 2024-07-13T03:39:01Z

to minimize the line of code change for a model to support pipeline parallel.

in the model weight loading part, just add:

                if name not in params_dict:
                    # in pipeline parallelism, we may have layers that are not
                    # present on this rank
                    continue

(the benefit is we don't need to introduce extra indentation of the try-except code)

in the layer construction part, just add:

        self.start_layer, self.end_layer, self.layers = make_layers(
            config.num_hidden_layers,
            lambda: LlamaDecoderLayer(config=config,
                                      cache_config=cache_config,
                                      quant_config=quant_config))

hopefully, with these change, code for a model to support pp will be easier, and the review can also be easier.

github-actions · 2024-07-13T03:39:12Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only trigger fastcheck CI to run, which consists only a small and essential subset of tests to quickly catch errors with the flexibility to run extra individual tests on top (you can do this by unblocking test steps in the Buildkite run).

Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well.

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-07-13T09:53:03Z

Perhaps it would be better to move the make_layers function into the utils file inside models/?

robertgshaw2-neuralmagic · 2024-07-13T17:17:28Z

vllm/model_executor/models/llama.py

-                except KeyError:
-                    pass
+
+                if name not in params_dict:


This could silence some hard to track down bugs when loading more complex state dicts (in quantized case)

Could we try to check if the name corresponds to a layer not on the device b/c of PP?

this is quite difficult, as load_weights does not have layer information.

come up with a workaround in 61fa242. PTAL!

youkaichao · 2024-07-13T17:36:19Z

Perhaps it would be better to move the make_layers function into the utils file inside models/?

fixed in 347399e

andoorve · 2024-07-14T04:12:26Z

I noticed CUDA out of memory error on the basic correctness tests here. Is it reproducible locally? I think this change shouldn't cause that so it's possibly a flaky test?

youkaichao · 2024-07-14T06:59:32Z

/ready

youkaichao · 2024-07-14T16:54:19Z

@andoorve finally figured it out, it is because lru cache stores a reference of the model, and then fails the gc system :(

youkaichao · 2024-07-15T04:20:40Z

merge first to unblock the following models support.

yudian0504 · 2024-07-15T09:11:52Z

vllm/model_executor/models/utils.py

+def is_pp_missing_parameter(name: str, model: torch.nn.Module) -> bool:
+    """Check if a parameter is missing in a pipeline parallel model."""
+    for missing_layer_name in get_pp_missing_layer_names(model):
+        if name.startswith(missing_layer_name):


:-) "xx.11".startswith("xx.1")

sorry for the bug, and thanks for pointing it out so quickly! please take a look at #6446 .

…project#6406)

…project#6406) Signed-off-by: Alvant <[email protected]>

youkaichao added 3 commits July 12, 2024 20:18

update gpt2

488e2b8

update llama

e7a2842

remove duplicate code

8b725b5

fix import

1945e61

robertgshaw2-neuralmagic reviewed Jul 13, 2024

View reviewed changes

move to models/

347399e

youkaichao added 2 commits July 13, 2024 10:56

check names

61fa242

Merge branch 'main' into simplify_pp

2c2a1de

youkaichao added 2 commits July 13, 2024 22:58

unify tests

78d7d4b

change order

5d8ddd4

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 14, 2024

youkaichao added 2 commits July 14, 2024 09:04

fix gc

d4f851f

isort

50b91a8

youkaichao requested a review from robertgshaw2-neuralmagic July 14, 2024 16:54

youkaichao merged commit 69672f1 into vllm-project:main Jul 15, 2024
73 checks passed

youkaichao deleted the simplify_pp branch July 15, 2024 04:20

youkaichao mentioned this pull request Jul 15, 2024

[Model] Pipeline parallel support for Mixtral #6403

Closed

yudian0504 reviewed Jul 15, 2024

View reviewed changes

This was referenced Jul 15, 2024

[misc][distributed] fix pp missing layer condition #6446

Merged

[Distributed][Model] Rank-based Component Creation for Pipeline Parallelism Memory Optimization #6455

Merged

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024

[core][distributed] simplify code to support pipeline parallel (vllm-…

09a8d63

…project#6406)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[core][distributed] simplify code to support pipeline parallel (vllm-…

93fe5e7

…project#6406)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[core][distributed] simplify code to support pipeline parallel (vllm-…

3ddf224

…project#6406) Signed-off-by: Alvant <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][distributed] simplify code to support pipeline parallel #6406

[core][distributed] simplify code to support pipeline parallel #6406

youkaichao commented Jul 13, 2024

github-actions bot commented Jul 13, 2024

DarkLight1337 commented Jul 13, 2024 •

edited

Loading

robertgshaw2-neuralmagic Jul 13, 2024

youkaichao Jul 13, 2024

youkaichao Jul 13, 2024

youkaichao commented Jul 13, 2024

andoorve commented Jul 14, 2024 •

edited

Loading

youkaichao commented Jul 14, 2024

youkaichao commented Jul 14, 2024

youkaichao commented Jul 15, 2024

yudian0504 Jul 15, 2024

youkaichao Jul 15, 2024

[core][distributed] simplify code to support pipeline parallel #6406

[core][distributed] simplify code to support pipeline parallel #6406

Conversation

youkaichao commented Jul 13, 2024

github-actions bot commented Jul 13, 2024

DarkLight1337 commented Jul 13, 2024 • edited Loading

robertgshaw2-neuralmagic Jul 13, 2024

Choose a reason for hiding this comment

youkaichao Jul 13, 2024

Choose a reason for hiding this comment

youkaichao Jul 13, 2024

Choose a reason for hiding this comment

youkaichao commented Jul 13, 2024

andoorve commented Jul 14, 2024 • edited Loading

youkaichao commented Jul 14, 2024

youkaichao commented Jul 14, 2024

youkaichao commented Jul 15, 2024

yudian0504 Jul 15, 2024

Choose a reason for hiding this comment

youkaichao Jul 15, 2024

Choose a reason for hiding this comment

DarkLight1337 commented Jul 13, 2024 •

edited

Loading

andoorve commented Jul 14, 2024 •

edited

Loading