question about Instance-Masked Attention #6

jsg921019 · 2024-02-26T14:54:04Z

Thank you for sharing interesting work.

I have question about Instance-Masked Attention.
Current code does not seems to apply Instance-Masked Attention. (return_att_masks = False)
Is this because not applying Instance-Masked Attention is better in generation quality?

Secondly, is Instance-Masked Attention applied when training? Or is this only applied when inference?

Thank you in advance.

frank-xwang · 2024-02-26T16:59:30Z

Hi, thank you for expressing your interest. Currently, we have return_att_masks set to False as Flash Attention does not yet support attention masks (check it here). However, if speed and memory usage are not primary concerns for your application, you may opt to set return_att_masks to True. It's worth noting that during our learning process, we had this option enabled. Hope it helps!

jsg921019 · 2024-02-26T17:47:26Z

Thank you for precise and fast feedback!

jsg921019 closed this as completed Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about Instance-Masked Attention #6

question about Instance-Masked Attention #6

jsg921019 commented Feb 26, 2024

frank-xwang commented Feb 26, 2024

jsg921019 commented Feb 26, 2024

question about Instance-Masked Attention #6

question about Instance-Masked Attention #6

Comments

jsg921019 commented Feb 26, 2024

frank-xwang commented Feb 26, 2024

jsg921019 commented Feb 26, 2024