Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xformers error when fine-tuning open_llama_3B with memory_efficient_attention #88

Open
EliverQ opened this issue Aug 13, 2023 · 5 comments

Comments

@EliverQ
Copy link

EliverQ commented Aug 13, 2023

Hi, I feel confused about this bug when using memory_efficient_attention. It seems that the embed per head you choose can't match with xformers?

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(4, 512, 32, 100) (torch.bfloat16)
     key         : shape=(4, 512, 32, 100) (torch.bfloat16)
     value       : shape=(4, 512, 32, 100) (torch.bfloat16)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'>
     p           : 0.1
`flshattF` is not supported because:
    query.shape[-1] % 8 != 0
`tritonflashattF` is not supported because:
    dropout > 0.0
    query.shape[-1] % 8 != 0
    key.shape[-1] % 8 != 0
    value.shape[-1] % 8 != 0
`cutlassF` is not supported because:
    query.shape[-1] % 8 != 0
    value.shape[-1] % 8 != 0
`smallkF` is not supported because:
    dtype=torch.bfloat16 (supported: {torch.float32})
    max(query.shape[-1] != value.shape[-1]) > 32
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'>
    unsupported embed per head: 100

I'll appreciate it if you could help me.

@EliverQ
Copy link
Author

EliverQ commented Aug 13, 2023

By the way, I think the problem maybe the dtype I use (bf16). But the dtype in your config is fp16 and still doesn't work?

@young-geng
Copy link
Contributor

For the 3B model, since there's no official LLaMA 3B, we defined the model size ourselves and it might not agree with the 3B model sizes in other implementations

@EliverQ
Copy link
Author

EliverQ commented Aug 14, 2023

For the 3B model, since there's no official LLaMA 3B, we defined the model size ourselves and it might not agree with the 3B model sizes in other implementations

But I just use the hf code and checkpoint you released and don't modify anything.

@young-geng
Copy link
Contributor

Hmm, then that might be a bug on the HF side. We've tested it in HF transformers without the memory_efficient_attention and it works as expected.

@EliverQ
Copy link
Author

EliverQ commented Aug 14, 2023

Thank you very much! Perhaps I've been using the code incorrectly all along.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants