-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency between GPTNeo and GPT2 config classes #12183
Comments
Seconding this. Last month I swapped out GPT-2 for GPT-Neo in a project, and these differences made it more difficult to adapt my existing code. |
Hi @leogao2 and @nostalgebraist, thanks for opening an issue! You're correct that the way this is currently implemented it prevents a few use-cases. Namely this is authorized: from transformers import GPT2Config
config = GPT2Config()
config.hidden_size But these are not: from transformers import GPT2Config
config = GPT2Config()
config.hidden_size = 4
# Fails
config = GPT2Config(hidden_size=4)
# Fails Unfortunately we can't just rename arguments - as this would break both checkpoints on the hub as well as local checkpoints. We're thinking of a way to enable this with a convention set across configurations for the attributes you mention - this convention would allow getting and setting attributes that are defined in this convention, such as the ones you mention. Let us explore a bit and we'll come back to you. cc @patil-suraj @patrickvonplaten @sgugger |
This still needs to be addressed. |
Is there any progress on this? |
Hey @leogao2! Yes, a proposal is available here: nreimers@2198ee7 but there are still a few rough edges to polish. We'll try to have it merged in the next few weeks, will let you know. |
This was fixed in #13026 which will be in the next release alongside GPT-J. Thank you for opening an issue! |
The config classes for GPTNeo and GPT2 have a bunch of differences that are seemingly unnecessary. This makes it harder for downstream users to write code that depends on accessing these attributes. See below:
It seems that max_position_embeddings, hidden_size, num_layers, num_heads, intermediate_size, resid_dropout, embed_dropout, and attention_dropout should be renamed for sonsistency with the GPT2 config class.
Who can help
@LysandreJik @patil-suraj
The text was updated successfully, but these errors were encountered: