Ensure Consistency Between GPTConfig.block_size and Sequence Length T #72

Benetti-Hub · 2024-08-09T13:19:07Z

First and foremost, I want to express my appreciation for this tutorial. It's incredibly insightful and well-structured.

I'm submitting this PR because I noticed a potential issue related to GPTConfig.block_size not being enforced to match the sequence length T.

If I understand correctly, this discrepancy could lead to unexpected model behavior during inference if T is lower than GPTConfig.block_size . (Note that an assertion error is already raised when T exceeds GPTConfig.block_size, as seen here).

Thank you for considering this change. Please let me know if any further adjustments are needed.

zhaziqwe · 2024-08-18T14:39:50Z

I have similar doubts about this problem. I don't know what will happen in casual mask after using flash attention.
# q * k^t / sqrt(hs) # att = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1))) # att = att.masked_fill(self.bias[:, :, :T, :T] == 0, float('-inf')) # att = F.softmax(att, dim=-1) # y = att @ v y = F.scaled_dot_product_attention(q,k,v,is_causal=True)
If it is not used, I think T does not need to be strictly equal to blocksize.

align T with GPTConfig

c5223aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure Consistency Between GPTConfig.block_size and Sequence Length T #72

Ensure Consistency Between GPTConfig.block_size and Sequence Length T #72

Benetti-Hub commented Aug 9, 2024

zhaziqwe commented Aug 18, 2024 •

edited

Loading

Ensure Consistency Between GPTConfig.block_size and Sequence Length T #72

Are you sure you want to change the base?

Ensure Consistency Between GPTConfig.block_size and Sequence Length T #72

Conversation

Benetti-Hub commented Aug 9, 2024

zhaziqwe commented Aug 18, 2024 • edited Loading

zhaziqwe commented Aug 18, 2024 •

edited

Loading