Inconsistency between GPTNeo and GPT2 config classes #12183

leogao2 · 2021-06-15T19:50:59Z

The config classes for GPTNeo and GPT2 have a bunch of differences that are seemingly unnecessary. This makes it harder for downstream users to write code that depends on accessing these attributes. See below:

It seems that max_position_embeddings, hidden_size, num_layers, num_heads, intermediate_size, resid_dropout, embed_dropout, and attention_dropout should be renamed for sonsistency with the GPT2 config class.

Who can help

@LysandreJik @patil-suraj

nostalgebraist · 2021-06-15T20:10:46Z

Seconding this.

Last month I swapped out GPT-2 for GPT-Neo in a project, and these differences made it more difficult to adapt my existing code.

LysandreJik · 2021-06-16T07:25:59Z

Hi @leogao2 and @nostalgebraist, thanks for opening an issue! You're correct that the way this is currently implemented it prevents a few use-cases. Namely this is authorized:

from transformers import GPT2Config

config = GPT2Config()
config.hidden_size

But these are not:

from transformers import GPT2Config

config = GPT2Config()
config.hidden_size = 4
# Fails

config = GPT2Config(hidden_size=4)
# Fails

Unfortunately we can't just rename arguments - as this would break both checkpoints on the hub as well as local checkpoints. We're thinking of a way to enable this with a convention set across configurations for the attributes you mention - this convention would allow getting and setting attributes that are defined in this convention, such as the ones you mention.

Let us explore a bit and we'll come back to you. cc @patil-suraj @patrickvonplaten @sgugger

nostalgebraist · 2021-07-16T15:47:38Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

This still needs to be addressed.

leogao2 · 2021-08-05T15:13:10Z

Is there any progress on this?

LysandreJik · 2021-08-05T15:54:31Z

Hey @leogao2! Yes, a proposal is available here: nreimers@2198ee7 but there are still a few rough edges to polish. We'll try to have it merged in the next few weeks, will let you know.

LysandreJik · 2021-09-16T15:45:29Z

This was fixed in #13026 which will be in the next release alongside GPT-J. Thank you for opening an issue!

huggingface deleted a comment from github-actions bot Jul 16, 2021

LysandreJik added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Jul 16, 2021

StellaAthena mentioned this issue Aug 6, 2021

GPT-J-6B #13022

Merged

5 tasks

nreimers mentioned this issue Aug 6, 2021

Update model configs - Allow setters for common properties #13026

Merged

5 tasks

LysandreJik closed this as completed Sep 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency between GPTNeo and GPT2 config classes #12183

Inconsistency between GPTNeo and GPT2 config classes #12183

leogao2 commented Jun 15, 2021

nostalgebraist commented Jun 15, 2021

LysandreJik commented Jun 16, 2021

nostalgebraist commented Jul 16, 2021

leogao2 commented Aug 5, 2021

LysandreJik commented Aug 5, 2021

LysandreJik commented Sep 16, 2021

Inconsistency between GPTNeo and GPT2 config classes #12183

Inconsistency between GPTNeo and GPT2 config classes #12183

Comments

leogao2 commented Jun 15, 2021

Who can help

nostalgebraist commented Jun 15, 2021

LysandreJik commented Jun 16, 2021

nostalgebraist commented Jul 16, 2021

leogao2 commented Aug 5, 2021

LysandreJik commented Aug 5, 2021

LysandreJik commented Sep 16, 2021