Self-attn in decoder layers. #16

TsingWei · 2023-04-05T15:47:59Z

I noticed there is a section about DETA does not need self-attention in the decoder. in the paper. The results show that when the self-attn is replaced by ffn in decoder, the performance is better. I wonder whether the final version in the table of compared-with-other-SOTAs using this setting? Because I found in the code that the self-attn is hard-coded in the decoder layer:

DETA/models/deformable_transformer.py

Line 328 in dade176

self.self_attn = nn.MultiheadAttention(d_model, n_heads, dropout=dropout)

The text was updated successfully, but these errors were encountered:

jozhang97 · 2023-04-12T01:26:02Z

Thank you for your interest!

I wonder whether the final version in the table of compared-with-other-SOTAs using this setting?
No, our default model still contains self-attention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-attn in decoder layers. #16

Self-attn in decoder layers. #16

TsingWei commented Apr 5, 2023 •

edited

Loading

jozhang97 commented Apr 12, 2023

Self-attn in decoder layers. #16

Self-attn in decoder layers. #16

Comments

TsingWei commented Apr 5, 2023 • edited Loading

jozhang97 commented Apr 12, 2023

TsingWei commented Apr 5, 2023 •

edited

Loading