Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gather and reduce scatter ops on sequence dimension #1260

Merged

Conversation

bclyang
Copy link
Contributor

@bclyang bclyang commented Aug 17, 2024

Fix issue where gather and reduce scatter operations are not correct when seq_dim != 0. This is because reduce_scatter_tensor and gather_tensor only work correctly across the primary dimension of the tensor. This PR switches to use reduce_scatter and gather directly. This PR also adds support for no_weight_tying=False when sequence_parallel=True

Tests

After this fix, the 410M test config converges correctly when sequence_parallel=True and sequence_parallel=False on EnWiki8 on 2 GPUs without weight tying:
image

Memory consumption is lower on WandB:
image

Works with weight tying:
image

Works with pipe_parallel_size=0 and pipe_parallel_size=1:
image

Works with use_qk_layernorm=True and use_qk_layernorm=False:
image

Works with MoE with moe_type="deepspeed"and moe_num_experts=2, but have not tested with any other settings including moe_type="megablocks", moe_expert_parallel_size > 1 or enable_expert_tensor_parallelism=True. There's a lot of settings here, so for now, recommend not supporting.
image

@bclyang bclyang changed the base branch from main to 812-megatron-seq-parallel August 17, 2024 23:28
Copy link
Contributor

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great finds!

@haileyschoelkopf haileyschoelkopf merged commit b0d9398 into 812-megatron-seq-parallel Aug 19, 2024
2 checks passed
@haileyschoelkopf haileyschoelkopf deleted the fix-seq-dim-reducegatherdactter branch August 19, 2024 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants