-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][distributed] simplify code to support pipeline parallel #6406
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well. To run full CI, you can do one of these:
🚀 |
Perhaps it would be better to move the |
vllm/model_executor/models/llama.py
Outdated
except KeyError: | ||
pass | ||
|
||
if name not in params_dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could silence some hard to track down bugs when loading more complex state dicts (in quantized case)
Could we try to check if the name
corresponds to a layer not on the device b/c of PP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite difficult, as load_weights
does not have layer information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
come up with a workaround in 61fa242. PTAL!
fixed in 347399e |
I noticed CUDA out of memory error on the basic correctness tests here. Is it reproducible locally? I think this change shouldn't cause that so it's possibly a flaky test? |
/ready |
@andoorve finally figured it out, it is because lru cache stores a reference of the model, and then fails the gc system :( |
merge first to unblock the following models support. |
def is_pp_missing_parameter(name: str, model: torch.nn.Module) -> bool: | ||
"""Check if a parameter is missing in a pipeline parallel model.""" | ||
for missing_layer_name in get_pp_missing_layer_names(model): | ||
if name.startswith(missing_layer_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-) "xx.11".startswith("xx.1")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the bug, and thanks for pointing it out so quickly! please take a look at #6446 .
…project#6406) Signed-off-by: Alvant <[email protected]>
to minimize the line of code change for a model to support pipeline parallel.
in the model weight loading part, just add:
(the benefit is we don't need to introduce extra indentation of the try-except code)
in the layer construction part, just add:
hopefully, with these change, code for a model to support pp will be easier, and the review can also be easier.