-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPT Neo past_key_values unexpected behaviour #11787
Comments
I encountered a similar problem when trying to use GPT-Neo with PPLM (https://github.com/uber-research/PPLM). Seems that Neo's inputs = tokenizer(prompt, return_tensors='pt')
outputs = model(**inputs)
past = outputs.past_key_values
for idx, p in enumerate(past):
print(f'{idx}: {tuple(elem.shape for elem in p)}')
# output
# 0: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 1: (torch.Size([1, 3, 768]),)
# 2: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 3: (torch.Size([1, 3, 768]),)
# 4: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 5: (torch.Size([1, 3, 768]),)
# 6: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 7: (torch.Size([1, 3, 768]),)
# 8: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 9: (torch.Size([1, 3, 768]),)
# 10: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 11: (torch.Size([1, 3, 768]),) GPT-2 correctly returns just the key-value tensors: # 0: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 1: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 2: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 3: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 4: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 5: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 6: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 7: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 8: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 9: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 10: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 11: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64])) |
After some more testing, the above seems to be because of local attention layers in GPT-Neo's default configuration. When specifying # 0: (torch.Size([1, 16, 3, 128]), torch.Size([1, 16, 3, 128]))
# 1: (torch.Size([1, 16, 3, 128]), torch.Size([1, 16, 3, 128]))
# 2: (torch.Size([1, 16, 3, 128]), torch.Size([1, 16, 3, 128]))
# 3: (torch.Size([1, 16, 3, 128]), torch.Size([1, 16, 3, 128]))
# 4: (torch.Size([1, 16, 3, 128]), torch.Size([1, 16, 3, 128]))
# ... I do think the documentation for |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi @patil-suraj, just checking if there is any progress on this issue or pull request #11630? That PR seems to fix the problem related to my usecase. |
The different shape for local attention layers is because of the folding going on in the current implementation. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
I have been successfully using the GPT2LMHeadModel module for text generation for some time and I recently tried to reuse the code to generate with GPTNeoForCausalLM. Though the documentations appear identical, I get the error "ValueError: not enough values to unpack (expected 2, got 1)" for the line
output, past = self.model(context, past_key_values=past, use_cache=True).values()
(which works fine for GPT2).Is this a bug or has the documentation been copied incorrectly? Would appreciate any tips for fixing.
Many thanks
The text was updated successfully, but these errors were encountered: