Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

universal-ckp: support llama model #287

Merged
merged 1 commit into from
Nov 16, 2023

Conversation

mosheisland
Copy link

Modify universal checkpoint parameter patterns based on the specific model configuration.
This commit adds support for llama family of models.

This commit depends on PR microsoft/DeepSpeed#4666 that adds support for SWIGLU linear parameter.

Following tensorboard captures training of a tiny llama model with DP=TP=PP=2 for 200 steps.
In addition, a checkpoint is taken in step 100 and converted to universal checkpoint.
Few other 3D configurations are trained from this universal checkpoint for the remaining 100 steps (till step 200).
It can be observed that both training and validation curves match for original 3D (2/2/2) and the others that load from universal.

universal-ckp-llama

Modify universal checkpoint parameter patterns based on the specific model
configuration. This commit adds support for llama family of models.

Signed-off-by: Moshe Island <[email protected]>
@mosheisland
Copy link
Author

@tjruwase , now that microsoft/DeepSpeed#4666 is merged, can you please merge this PR?

@tjruwase tjruwase merged commit 2348eed into microsoft:main Nov 16, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants