Add megablocks dropless MoE #1192

yang · 2024-03-22T00:49:50Z

This initial version focuses on getting megablocks integrated and working with DS parallelism. It makes megablocks experts work within the existing parallelism, which has the full degrees of freedom including expert, expert-data, and tensor-expert-data parallelism.

Tested on 8xA100 for convergence, expert balancing, and uncovered weight initialization issues (to be fixed later).

Design document and worklog that accompanied this project: https://yaaang.notion.site/gpt-neox-MoE-design-doc-cc8586eb53144a5987b63f510ced021c

In terms of where this fits larger arcs of work, next PRs (don't have permission to submit stacked PRs) are for:

improved expert initialization like we discussed
adding integration tests around this that automate the verification I was showing earlier around convergence and expert + router gradients
making it work with DS pipeline parallelism
merging with Colin's code and doing the megablocks code fork

Quentin-Anthony

Tested and working for MoE on my end. No comments.

yang requested a review from Quentin-Anthony as a code owner March 22, 2024 00:49

yang force-pushed the mbmoe branch 2 times, most recently from 37f19bd to 1013ddd Compare April 24, 2024 17:58

Add megablocks dropless MoE

86b7722

yang force-pushed the mbmoe branch from 1013ddd to 86b7722 Compare April 24, 2024 18:03

pre-commit

b1a1ae1

Quentin-Anthony approved these changes May 4, 2024

View reviewed changes

Quentin-Anthony merged commit 916c883 into EleutherAI:main May 4, 2024
2 of 6 checks passed

jahatef mentioned this pull request May 15, 2024

Run document update again #1216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add megablocks dropless MoE #1192

Add megablocks dropless MoE #1192

yang commented Mar 22, 2024 •

edited

Loading

Quentin-Anthony left a comment

Add megablocks dropless MoE #1192

Add megablocks dropless MoE #1192

Conversation

yang commented Mar 22, 2024 • edited Loading

Quentin-Anthony left a comment

Choose a reason for hiding this comment

yang commented Mar 22, 2024 •

edited

Loading