Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge with latest upstream #5

Open
wants to merge 78 commits into
base: main
Choose a base branch
from
Open

Conversation

Quentin-Anthony
Copy link
Collaborator

No description provided.

zhangsheng377 and others added 30 commits November 8, 2023 18:12
* universal-ckp: fix gpt model param names

Signed-off-by: Moshe Island <[email protected]>

* universal-ckp: reconfigure model parameter rng tracker

When loading from universal checkpoint with a different model parameter
configuration, the loaded tensor parallel RNG tracker states are incorrect.
In this case, we reconfigure the tensor parallel RNG tracker states with new
seed values (each tp rank with a unique seed).
We add an offset=iteration to the base seed. This is to ensure that when we
load multiple times from universal checkpoint, we will use a different random
sequence at each run.

This commit requires a counter change in DeepSpeed repo.

Signed-off-by: Moshe Island <[email protected]>

* universal-ckp: remove embedding norm patterns

Embedding norm patterns originate from Bloom, but are not in vanilla GPT.
Therefore, Remove the patterns.

Signed-off-by: Moshe Island <[email protected]>

---------

Signed-off-by: Moshe Island <[email protected]>
Co-authored-by: Moshe Island <[email protected]>
…ackage (#288)

If using the command "pip install .", the megatron/model/vision will not
appear in the python/dist-packages/megatron, in order to fix it, add a
__init__.py.

Signed-off-by: yuanwu <[email protected]>
Modify universal checkpoint parameter patterns based on the specific model
configuration. This commit adds support for llama family of models.

Signed-off-by: Moshe Island <[email protected]>
Co-authored-by: Moshe Island <[email protected]>
* Extend universal checkpoint support for ds sequence parallelism (SP) and ZeRO stage 2

* Extend universal checkpoint support for ds sequence parallelism (SP) and ZeRO stage 2

* Extend README and batch scripts discussion

* Extend README and batch scripts discussion

* Extend README and batch scripts discussion

* Remove debug statement

* Script no pipeline parallel for ZeRO stage 2
* Revert "Modify the args_default usage (#250)"

This reverts commit 3095a51.

* Add the external arguments

Add the external_arguments for passing the arguments from function call.

Signed-off-by: yuanwu <[email protected]>

---------

Signed-off-by: yuanwu <[email protected]>
* alpaca hf weight finetune

clean up

update

update

update

update

update

update

update

arg fix

update

clean up

update

update

update

refine weight converter

don't cat when dim=0

format

update

update

update

* add finetune script

* add condition for no padded token case

* add reference

---------

Co-authored-by: Conglong Li <[email protected]>
* Fixed incorrect argument supplied to deepspeed init

* Added suggestion to make fix backwards compatible
This PR updates the Universal Checkpointing README with instructions on how to download the GPT dataset and cleans up a few nits in the corresponding bash scripts.
)

* Clean up UC scripts and update UC README

* Revert LOAD_TP change

* Update parallelism degrees

* UC Matplotlib generation script

* Add matplotlib code

* Script rename

* Source label names using regex

* Update plot gen script

* Revert 3D parallelism change

* regex matches to py variables

* Move location of script

* Update regex to search for multi-digit parallelism degrees

* Create ABC class for analyzer and remove UC specific analysis elements

* Move args to separate folder, add sns switch

* add bash script for UC analysis

* Change name of script

* Move UC specific label name to class

* Rename script

* clean up script

* Update analyzer return

* Update bash script

* remove log_dir

* Address PR comments
This PR updates the Megatron type check to check against the accelerator specific dtype instead of the class. The change is necessary to account for warning fixes in microsoft/DeepSpeed#5018.
polisettyvarma and others added 30 commits July 8, 2024 15:58
…urther on device (#411)

* improve performance by keeping attention_mask on device and run ops further on device

* add copyrights
* improve RoPE perf by using cached sin/cos tensors

* add copyrights
* Extend test utilities to support more accelerators

* Add Intel Copyright
* Update arguments.py

* Update training.py

* Create profiler.py

* add copyrights

* Update profiler.py

* add copyrights

* Update help

* add copyrights
* Refine wandb logging function

* Address comments

* enable user to specify wandb local save dir

* Update and fix comments

* Update
…412)

* Update arguments.py

* Update training.py

* Update utils.py

* add copyrights

* add copyrights

* add copyrights

* Update arguments.py help

* Update arguments.py

* Update training.py

* Update utils.py

* Update arguments.py
…rocessing (#421)

* Update arguments.py

* Update tokenizer.py

* Update preprocess_data.py
* Update module.py

* Update preprocess_data.py

* add copyrights

* add copyrights

* Update tokenizer.py

* add copyrights
This PR adds a Llama universal checkpointing example to examples_deepspeed/universal_checkpointing.

It also includes changes to the README, some minor changes, and an update to the TensorBoard analysis script.
…sing flash_attn_cuda in sequence parallel (#406)

Co-authored-by: Jinghan Yao <[email protected]>
…on for supporting batch size larger than 1 (#433)

Co-authored-by: Jinghan Yao <[email protected]>
* add support converting checkpoint from hf to mds

* Fix PP issue

* update
* fix TFLOPs calculation

when GQA used, we observe right TFLOPs after this fix.
when GQA is not used, huge difference in TFLOPs is solved with 
selective recompute .
some other minor difference will also be observed as logits macs also added.

* add copyrights
* pass batch_dim_idx to deepspeed sequence parallel distributed attention for supporting batch size larger than 1

* add FPDT support; add Ulysses rotary position embedding support

* add FPDT support; add Ulysses rotary position embedding support

* add FPDT support; add Ulysses rotary position embedding support

* add FPDT support; add Ulysses rotary position embedding support

* remove unnecessary files

* set the warmup length to be FPDT chunk size if enabled

---------

Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.