Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify Tensor Parallel implementation with PyTorch TP #34184
Simplify Tensor Parallel implementation with PyTorch TP #34184
Changes from 2 commits
e60fb87
fd7f7c7
9224cab
79cc524
a2934b3
a8fc418
e84a388
396d158
7b346b5
d60679b
dda058a
12fbbe7
02c8c39
073c521
db6e5ee
5bb294e
290a7f1
bd2e89c
4892cef
9648f31
93ba283
73524c9
f312e55
ca93bdb
dc2672f
1e27d6f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to allow this for external use by removing
_
so that we can allow users to define tp plan tweaks fromconfig.json
?Given that, shall we as well allow for providing custom tp plan as input to
LlamaConfig()
that overrides the default?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good idea. We can make this public once we prove things work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep,
base_model_tp_plan
should be supported as input to the PreTrainedConfig!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this variable
base_model_tp_plan
has to be added to PreTrainedConfigtransformers/src/transformers/configuration_utils.py
Line 50 in a769ed4
{}
which I believe is best possible default for any config sub class inheriting.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kmehant @ArthurZucker for the suggestion. I moved
base_model_tp_plan
toPretrainedConfig
in the latest commit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mapping from tp style to the correct function might be better.
We are also thinking of defining a TensorParallelConfig, your feedback is welcome here, as I don't know the variety of classes / args that might be used!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment! Indeed a mapping style would look better.
The only caveat is that the returned value here is an object rather than a constant or class (see the
()
behindColwiseParallel
). If we prepare a map, we may be always returning the same object -- theparallelize_module
API may be able to support it I guess, I am just not sure if there is a contract guaranteeing that today.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pitching in :)
We should be able to use the same object since it applies required parallel operation to the module and returns a new copy - https://github.com/pytorch/pytorch/blob/86d4b7d60b264cae5a04a1b20719bcd7a5752a4c/torch/distributed/tensor/parallel/api.py#L95
Have also tested it empirically while benchmarking (#34194)
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SOunds good!