Fix glu activation #148

thomasw21 · 2021-10-21T14:09:42Z

Test concerning glu was bypassed due to missing "--no-bias-gelu-fusion" which bypasses the whole activation path. Check https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/megatron/model/transformer.py#L101

The reason I came up with that, is glu activations divides the last dimension by two. which would not be possible in current MLP mechanism.

In fact https://arxiv.org/abs/2002.05202 changes the hidden size as instead of two matrices, you have three (2 are merged together in the implementation)

cc @jaketae

stas00

I will try to read the paper soon, but until then I defer to Jake on the proposed functionality change.

I made one proposal on the code changes, the rest looks good.

Thank you, Thomas!

megatron/arguments.py

jaketae

Thank you for fixing this! I realize that GLU unit tests only perform checks on individual tensors and their computed values, not the whole training pipeline, so this fix was necessary. Looks good to me!

stas00 · 2021-10-22T04:06:25Z

Hmm, I think we got a new pytorch and apex needs a manual rebuild and making a new image:

ImportError: /usr/local/lib/python3.8/dist-packages/amp_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7optionsEv

but test_training_all_3_glu test now fails with this PR:

stderr: ValueError: if glu-activation is used, please set --no-bias-gelu-fusion

I pushed a fix and re-run the test suite - good to merge now.

thomasw21 added 4 commits October 21, 2021 15:47

Make sure to use glu activation when specified

e184e4b

Woops forgot DS config

da33290

Upsample ffn_hidden_size when glu is used

c83529f

Woops

a45d4a2

thomasw21 requested review from jaketae and stas00 October 21, 2021 14:09

stas00 approved these changes Oct 21, 2021

View reviewed changes

megatron/arguments.py Outdated Show resolved Hide resolved

Replace assert with raising exception instead

57deccb

jaketae approved these changes Oct 22, 2021

View reviewed changes

fix bug

f9dd27d

thomasw21 merged commit bdc6ad6 into main Oct 22, 2021

thomasw21 deleted the thomas/fix_glu_activation branch November 4, 2021 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix glu activation #148

Fix glu activation #148

thomasw21 commented Oct 21, 2021 •

edited

Loading

stas00 left a comment •

edited

Loading

jaketae left a comment

stas00 commented Oct 22, 2021 •

edited

Loading

Fix glu activation #148

Fix glu activation #148

Conversation

thomasw21 commented Oct 21, 2021 • edited Loading

stas00 left a comment • edited Loading

Choose a reason for hiding this comment

jaketae left a comment

Choose a reason for hiding this comment

stas00 commented Oct 22, 2021 • edited Loading

thomasw21 commented Oct 21, 2021 •

edited

Loading

stas00 left a comment •

edited

Loading

stas00 commented Oct 22, 2021 •

edited

Loading