forked from microsoft/Megatron-DeepSpeed
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge with latest upstream #5
Open
Quentin-Anthony
wants to merge
78
commits into
Zyphra:main
Choose a base branch
from
microsoft:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* universal-ckp: fix gpt model param names Signed-off-by: Moshe Island <[email protected]> * universal-ckp: reconfigure model parameter rng tracker When loading from universal checkpoint with a different model parameter configuration, the loaded tensor parallel RNG tracker states are incorrect. In this case, we reconfigure the tensor parallel RNG tracker states with new seed values (each tp rank with a unique seed). We add an offset=iteration to the base seed. This is to ensure that when we load multiple times from universal checkpoint, we will use a different random sequence at each run. This commit requires a counter change in DeepSpeed repo. Signed-off-by: Moshe Island <[email protected]> * universal-ckp: remove embedding norm patterns Embedding norm patterns originate from Bloom, but are not in vanilla GPT. Therefore, Remove the patterns. Signed-off-by: Moshe Island <[email protected]> --------- Signed-off-by: Moshe Island <[email protected]> Co-authored-by: Moshe Island <[email protected]>
…ackage (#288) If using the command "pip install .", the megatron/model/vision will not appear in the python/dist-packages/megatron, in order to fix it, add a __init__.py. Signed-off-by: yuanwu <[email protected]>
#289) Signed-off-by: yuanwu <[email protected]>
Signed-off-by: yuanwu <[email protected]>
Modify universal checkpoint parameter patterns based on the specific model configuration. This commit adds support for llama family of models. Signed-off-by: Moshe Island <[email protected]> Co-authored-by: Moshe Island <[email protected]>
* Extend universal checkpoint support for ds sequence parallelism (SP) and ZeRO stage 2 * Extend universal checkpoint support for ds sequence parallelism (SP) and ZeRO stage 2 * Extend README and batch scripts discussion * Extend README and batch scripts discussion * Extend README and batch scripts discussion * Remove debug statement * Script no pipeline parallel for ZeRO stage 2
* Revert "Modify the args_default usage (#250)" This reverts commit 3095a51. * Add the external arguments Add the external_arguments for passing the arguments from function call. Signed-off-by: yuanwu <[email protected]> --------- Signed-off-by: yuanwu <[email protected]>
* alpaca hf weight finetune clean up update update update update update update update arg fix update clean up update update update refine weight converter don't cat when dim=0 format update update update * add finetune script * add condition for no padded token case * add reference --------- Co-authored-by: Conglong Li <[email protected]>
* Fixed incorrect argument supplied to deepspeed init * Added suggestion to make fix backwards compatible
This PR updates the Universal Checkpointing README with instructions on how to download the GPT dataset and cleans up a few nits in the corresponding bash scripts.
) * Clean up UC scripts and update UC README * Revert LOAD_TP change * Update parallelism degrees * UC Matplotlib generation script * Add matplotlib code * Script rename * Source label names using regex * Update plot gen script * Revert 3D parallelism change * regex matches to py variables * Move location of script * Update regex to search for multi-digit parallelism degrees * Create ABC class for analyzer and remove UC specific analysis elements * Move args to separate folder, add sns switch * add bash script for UC analysis * Change name of script * Move UC specific label name to class * Rename script * clean up script * Update analyzer return * Update bash script * remove log_dir * Address PR comments
This PR updates the Megatron type check to check against the accelerator specific dtype instead of the class. The change is necessary to account for warning fixes in microsoft/DeepSpeed#5018.
…urther on device (#411) * improve performance by keeping attention_mask on device and run ops further on device * add copyrights
* improve RoPE perf by using cached sin/cos tensors * add copyrights
* Extend test utilities to support more accelerators * Add Intel Copyright
* Update arguments.py * Update training.py * Create profiler.py * add copyrights * Update profiler.py * add copyrights * Update help * add copyrights
* Refine wandb logging function * Address comments * enable user to specify wandb local save dir * Update and fix comments * Update
…412) * Update arguments.py * Update training.py * Update utils.py * add copyrights * add copyrights * add copyrights * Update arguments.py help * Update arguments.py * Update training.py * Update utils.py * Update arguments.py
…rocessing (#421) * Update arguments.py * Update tokenizer.py * Update preprocess_data.py
* Update module.py * Update preprocess_data.py * add copyrights * add copyrights * Update tokenizer.py * add copyrights
This PR adds a Llama universal checkpointing example to examples_deepspeed/universal_checkpointing. It also includes changes to the README, some minor changes, and an update to the TensorBoard analysis script.
…sing flash_attn_cuda in sequence parallel (#406) Co-authored-by: Jinghan Yao <[email protected]>
…on for supporting batch size larger than 1 (#433) Co-authored-by: Jinghan Yao <[email protected]>
* add support converting checkpoint from hf to mds * Fix PP issue * update
* fix TFLOPs calculation when GQA used, we observe right TFLOPs after this fix. when GQA is not used, huge difference in TFLOPs is solved with selective recompute . some other minor difference will also be observed as logits macs also added. * add copyrights
Co-authored-by: Logan Adams <[email protected]>
…l divided the gradient (#428)
* pass batch_dim_idx to deepspeed sequence parallel distributed attention for supporting batch size larger than 1 * add FPDT support; add Ulysses rotary position embedding support * add FPDT support; add Ulysses rotary position embedding support * add FPDT support; add Ulysses rotary position embedding support * add FPDT support; add Ulysses rotary position embedding support * remove unnecessary files * set the warmup length to be FPDT chunk size if enabled --------- Co-authored-by: Jinghan Yao <[email protected]> Co-authored-by: Jinghan Yao <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.