GPT Neo past_key_values unexpected behaviour #11787

edwinagnew · 2021-05-20T13:47:24Z

I have been successfully using the GPT2LMHeadModel module for text generation for some time and I recently tried to reuse the code to generate with GPTNeoForCausalLM. Though the documentations appear identical, I get the error "ValueError: not enough values to unpack (expected 2, got 1)" for the lineoutput, past = self.model(context, past_key_values=past, use_cache=True).values() (which works fine for GPT2).

Is this a bug or has the documentation been copied incorrectly? Would appreciate any tips for fixing.

Many thanks

The text was updated successfully, but these errors were encountered:

Express50 · 2021-06-12T16:04:23Z

I encountered a similar problem when trying to use GPT-Neo with PPLM (https://github.com/uber-research/PPLM). Seems that Neo's past_key_values is returning and consuming key-value tensors as well as (I'm guessing) feed-forward tensors:

inputs = tokenizer(prompt, return_tensors='pt')
outputs = model(**inputs)
past = outputs.past_key_values

for idx, p in enumerate(past):
    print(f'{idx}: {tuple(elem.shape for elem in p)}')

# output
# 0: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 1: (torch.Size([1, 3, 768]),)
# 2: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 3: (torch.Size([1, 3, 768]),)
# 4: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 5: (torch.Size([1, 3, 768]),)
# 6: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 7: (torch.Size([1, 3, 768]),)
# 8: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 9: (torch.Size([1, 3, 768]),)
# 10: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 11: (torch.Size([1, 3, 768]),)

GPT-2 correctly returns just the key-value tensors:

# 0: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 1: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 2: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 3: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 4: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 5: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 6: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 7: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 8: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 9: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 10: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))
# 11: (torch.Size([1, 12, 3, 64]), torch.Size([1, 12, 3, 64]))

Express50 · 2021-06-20T19:35:37Z

After some more testing, the above seems to be because of local attention layers in GPT-Neo's default configuration. When specifying config = GPTNeoConfig(attention_types=[[["global"], 24]]), I get similar past_key_values as in GPT-2:

# 0: (torch.Size([1, 16, 3, 128]), torch.Size([1, 16, 3, 128]))
# 1: (torch.Size([1, 16, 3, 128]), torch.Size([1, 16, 3, 128])) 
# 2: (torch.Size([1, 16, 3, 128]), torch.Size([1, 16, 3, 128])) 
# 3: (torch.Size([1, 16, 3, 128]), torch.Size([1, 16, 3, 128])) 
# 4: (torch.Size([1, 16, 3, 128]), torch.Size([1, 16, 3, 128])) 
# ...

I do think the documentation for past_key_values should be updated since it currently says: "with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)"

github-actions · 2021-07-16T15:02:17Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Express50 · 2021-07-16T17:36:38Z

Hi @patil-suraj, just checking if there is any progress on this issue or pull request #11630? That PR seems to fix the problem related to my usecase.

finetunej · 2021-07-27T14:00:05Z

The different shape for local attention layers is because of the folding going on in the current implementation.

github-actions · 2021-08-20T15:02:34Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

edwinagnew changed the title ~~GPT Neo past_key_vlaues unexpected behaviour~~ GPT Neo past_key_values unexpected behaviour May 20, 2021

LysandreJik assigned patil-suraj May 21, 2021

Express50 mentioned this issue Jun 24, 2021

Simplify GPT-Neo local attention implementation #11630

Closed

5 tasks

github-actions bot closed this as completed Aug 28, 2021

LysandreJik reopened this Aug 30, 2021

github-actions bot closed this as completed Sep 8, 2021

patil-suraj reopened this Sep 9, 2021

patil-suraj added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Sep 9, 2021

patil-suraj mentioned this issue Sep 9, 2021

[GPT-Neo] Simplify local attention #13491

Merged

patil-suraj closed this as completed in #13491 Sep 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT Neo past_key_values unexpected behaviour #11787

GPT Neo past_key_values unexpected behaviour #11787

edwinagnew commented May 20, 2021

Express50 commented Jun 12, 2021

Express50 commented Jun 20, 2021

github-actions bot commented Jul 16, 2021

Express50 commented Jul 16, 2021

finetunej commented Jul 27, 2021

github-actions bot commented Aug 20, 2021

GPT Neo past_key_values unexpected behaviour #11787

GPT Neo past_key_values unexpected behaviour #11787

Comments

edwinagnew commented May 20, 2021

Express50 commented Jun 12, 2021

Express50 commented Jun 20, 2021

github-actions bot commented Jul 16, 2021

Express50 commented Jul 16, 2021

finetunej commented Jul 27, 2021

github-actions bot commented Aug 20, 2021