Fix GPT-NeoX-20B past handling, attention computation #17811

zphang · 2022-06-21T22:30:58Z

What does this PR do?

Fixes GPT-NeoX-20B handing of the past object to correctly be used in .generate
Swaps attention computation for one more similar in the original training code, to hopefully avoid NaNs
Update docstring, removed unnecessary dropout configs in config object

Fixes # (issue)

#17632
#17452 (Hopefully)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sgugger

sgugger

Thanks for the fix, I can confirm that with this PR, I get the same generations in float32 and float16 (whereas before, I get either a crappy one or some Nans in float16) for EleutherAI/gpt-neox-20b.

The cleaning up in the config LGTM, thanks for making the docstring on par with the defaults, and the two attributes you remove are not used anywhere.

src/transformers/models/gpt_neox/modeling_gpt_neox.py

HuggingFaceDocBuilderDev · 2022-06-21T22:51:53Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2022-06-21T22:56:19Z

There are a few equivalence tests failing with the PR, if you can dive into it. Let us know if you need any help!

patrickvonplaten · 2022-06-21T23:28:06Z

src/transformers/models/gpt_neox/modeling_gpt_neox.py

@@ -635,7 +648,7 @@ def prepare_inputs_for_generation(self, input_ids, past=None, attention_mask=Non
            attention_mask = input_ids.new_ones(input_shape)

        # cut decoder_input_ids if past is used
-        if past is not None:
+        if past is not None and past[0] is not None:


Out of curiosity why is the second statement needed here? The past[0] is not None part?

patrickvonplaten · 2022-06-21T23:28:24Z

src/transformers/models/gpt_neox/configuration_gpt_neox.py

@@ -38,32 +38,28 @@ class GPTNeoXConfig(PretrainedConfig):


    Args:


Thanks for cleaning this up!

patrickvonplaten

Thanks for fixing @zphang !

zphang · 2022-06-29T01:04:14Z

I've run the tests locally and they pass, so I can't seem to reproduce the test errors. Can someone else give them a try?

sgugger · 2022-06-29T17:03:07Z

The tests pass on GPU but not on CPU on my side. So doing

CUDA_VISIBLE_DEVICES="" pytest tests/models/gpt_neox/test_modeling_gpt_neox.py

reproduces the failure.

…ly avoid NaN, update docs

sgugger · 2022-06-30T12:48:01Z

Thanks again! Nice to be able to use GPT-Neo-X in float16 for generations :-)

) * Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests

zphang force-pushed the neox_20b_fixes_220621_v2 branch from f6c9561 to a84811d Compare June 21, 2022 22:42

sgugger approved these changes Jun 21, 2022

View reviewed changes

src/transformers/models/gpt_neox/modeling_gpt_neox.py Show resolved Hide resolved

patrickvonplaten reviewed Jun 21, 2022

View reviewed changes

patrickvonplaten approved these changes Jun 21, 2022

View reviewed changes

zphang force-pushed the neox_20b_fixes_220621_v2 branch from ce7c60e to a84811d Compare June 29, 2022 01:03

zphang added 2 commits June 29, 2022 20:55

Fix GPT-NeoX-20B past handling, swap attention computation to hopeful…

50f6601

…ly avoid NaN, update docs

20B tests

d2e9de9

zphang force-pushed the neox_20b_fixes_220621_v2 branch from c946908 to d2e9de9 Compare June 30, 2022 03:55

sgugger merged commit 205bc41 into huggingface:main Jun 30, 2022

viclzhu pushed a commit to viclzhu/transformers that referenced this pull request Jul 18, 2022

Fix GPT-NeoX-20B past handling, attention computation (huggingface#17811

353a747

) * Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPT-NeoX-20B past handling, attention computation #17811

Fix GPT-NeoX-20B past handling, attention computation #17811

zphang commented Jun 21, 2022

sgugger left a comment •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 21, 2022 •

edited

Loading

sgugger commented Jun 21, 2022

patrickvonplaten Jun 21, 2022

patrickvonplaten Jun 21, 2022

patrickvonplaten left a comment

zphang commented Jun 29, 2022

sgugger commented Jun 29, 2022

sgugger commented Jun 30, 2022

		@@ -38,32 +38,28 @@ class GPTNeoXConfig(PretrainedConfig):


		Args:

Fix GPT-NeoX-20B past handling, attention computation #17811

Fix GPT-NeoX-20B past handling, attention computation #17811

Conversation

zphang commented Jun 21, 2022

What does this PR do?

Before submitting

Who can review?

sgugger left a comment • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 21, 2022 • edited Loading

sgugger commented Jun 21, 2022

patrickvonplaten Jun 21, 2022

Choose a reason for hiding this comment

patrickvonplaten Jun 21, 2022

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

zphang commented Jun 29, 2022

sgugger commented Jun 29, 2022

sgugger commented Jun 30, 2022

sgugger left a comment •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 21, 2022 •

edited

Loading