Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the new feature of FPDT #441

Merged
merged 8 commits into from
Dec 5, 2024
Merged

Conversation

YJHMITWEB
Copy link

@YJHMITWEB YJHMITWEB commented Aug 29, 2024

FPDT can only be work with this version of DeepSpeed.

@delock
Copy link

delock commented Aug 30, 2024

Hi @YJHMITWEB , is FPDT referring to this paper? https://ui.adsabs.harvard.edu/abs/2023JARS...17b6510H/abstract

@tohtana
Copy link

tohtana commented Aug 30, 2024

@YJHMITWEB Do we need changes in gpt2-merge.txt / gpt2-vocab.json? I'm not sure if we should check them in.

@@ -349,9 +349,12 @@ def _warmup_jit_function():
dtype = torch.float32

# Warmup fused bias+gelu
seq_length = args.seq_length
if args.ds_sequence_parallel_fpdt:
seq_length = 8192
Copy link

@tohtana tohtana Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define this as another variable like "FPDT_SEQ_LEN" and give a description in a comment why we have this setting?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed by setting it to be ds_sequence_parallel_fpdt_chunk_size if FPDT is enabled.

@@ -32,7 +35,9 @@ def forward(self, max_seq_len, offset=0):
emb = torch.cat((freqs, freqs), dim=-1)
# emb [seq_length, .., dim]
from einops import rearrange
return rearrange(emb, 'n d -> n 1 1 d')
base = rearrange(emb, 'n d -> n 1 1 d')
Copy link

@inkcherry inkcherry Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this change the output when use --use-rotary-position-embeddings, in llama style model?
FYI https://github.com/microsoft/Megatron-DeepSpeed/blob/main/examples_deepspeed/pretrain_llama2_distributed.sh

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tested both GPT and Llama models, this works well with both.

@samadejacobs
Copy link

Hi @YJHMITWEB , is FPDT referring to this paper? https://ui.adsabs.harvard.edu/abs/2023JARS...17b6510H/abstract

@delock, no, FPDT refers to this paper, aka Ulysses-Offload

@YJHMITWEB
Copy link
Author

Hi @YJHMITWEB , is FPDT referring to this paper? https://ui.adsabs.harvard.edu/abs/2023JARS...17b6510H/abstract

@delock, no, FPDT refers to this paper, aka Ulysses-Offload

Thanks @samadejacobs for pointing.

@YJHMITWEB
Copy link
Author

@microsoft-github-policy-service agree

@YJHMITWEB
Copy link
Author

@microsoft-github-policy-service agree

github-merge-queue bot pushed a commit to microsoft/DeepSpeed that referenced this pull request Nov 26, 2024
[FPDT](https://arxiv.org/abs/2408.16978) can only be used with [this
version](microsoft/Megatron-DeepSpeed#441) of
Megatron-DeepSpeed.

---------

Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Sam Ade Jacobs <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
github-merge-queue bot pushed a commit to microsoft/DeepSpeed that referenced this pull request Nov 26, 2024
[FPDT](https://arxiv.org/abs/2408.16978) can only be used with [this
version](microsoft/Megatron-DeepSpeed#441) of
Megatron-DeepSpeed.

---------

Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Sam Ade Jacobs <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
github-merge-queue bot pushed a commit to microsoft/DeepSpeed that referenced this pull request Nov 26, 2024
[FPDT](https://arxiv.org/abs/2408.16978) can only be used with [this
version](microsoft/Megatron-DeepSpeed#441) of
Megatron-DeepSpeed.

---------

Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Sam Ade Jacobs <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
github-merge-queue bot pushed a commit to microsoft/DeepSpeed that referenced this pull request Nov 26, 2024
[FPDT](https://arxiv.org/abs/2408.16978) can only be used with [this
version](microsoft/Megatron-DeepSpeed#441) of
Megatron-DeepSpeed.

---------

Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Sam Ade Jacobs <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
github-merge-queue bot pushed a commit to microsoft/DeepSpeed that referenced this pull request Dec 2, 2024
[FPDT](https://arxiv.org/abs/2408.16978) can only be used with [this
version](microsoft/Megatron-DeepSpeed#441) of
Megatron-DeepSpeed.

---------

Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Sam Ade Jacobs <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
key_layer = mixed_x_layer[:, :, self.projection_size:self.projection_size+self.kv_projection_size].reshape(seq_len, bs, -1, self.head_dim)
value_layer = mixed_x_layer[:, :, self.projection_size+self.kv_projection_size:].reshape(seq_len, bs, -1, self.head_dim)
if self.sequence_parallel or not self.enable_ds_sequence_parallel:
seq_len, bs = mixed_x_layer.shape[0], mixed_x_layer.shape[1]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worthwhile to keep "split_tensor" implementation as default, that is not fpdt scenario.

github-merge-queue bot pushed a commit to microsoft/DeepSpeed that referenced this pull request Dec 3, 2024
[FPDT](https://arxiv.org/abs/2408.16978) can only be used with [this
version](microsoft/Megatron-DeepSpeed#441) of
Megatron-DeepSpeed.

---------

Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Sam Ade Jacobs <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
loadams added a commit to microsoft/DeepSpeed that referenced this pull request Dec 4, 2024
[FPDT](https://arxiv.org/abs/2408.16978) can only be used with [this
version](microsoft/Megatron-DeepSpeed#441) of
Megatron-DeepSpeed.

---------

Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Sam Ade Jacobs <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Jinghan Yao <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
@samadejacobs samadejacobs merged commit 676a482 into microsoft:main Dec 5, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants