model and training code for the AR variant #34

MikeWangWZHL · 2024-09-10T18:44:42Z

Thanks for open-sourcing this amazing project!
I wonder it is possible to also release model and training code for the AR baseline

Thank you in advance!

LTH14 · 2024-09-10T19:19:37Z

To keep this repo clean, we don't have a plan to release the AR code in this repo. However, it is very easy to reimplement it using the current repo -- almost all hyper-parameters remain the same as MAR. The only difference is the causal attention mask and the teacher-forcing loss.

shaochenze · 2024-09-20T08:29:42Z

Hi @LTH14, in the AR variant, is it necessary for the attention mechanism within the MAE encoder to be causal? Alternatively, should we consider removing the MAE encoder altogether in this variant?

LTH14 · 2024-09-20T14:27:34Z

In the AR variant, we don't need the MAE encoder. A single causal decoder is enough (similar to GPT).

shaochenze · 2024-09-20T17:04:46Z

Thanks! Do you double the depth of MAE decoder?

LTH14 · 2024-09-20T21:35:19Z

Yes we keep the total number of parameters unchanged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model and training code for the AR variant #34

model and training code for the AR variant #34

MikeWangWZHL commented Sep 10, 2024

LTH14 commented Sep 10, 2024

shaochenze commented Sep 20, 2024 •

edited

Loading

LTH14 commented Sep 20, 2024 •

edited

Loading

shaochenze commented Sep 20, 2024

LTH14 commented Sep 20, 2024

model and training code for the AR variant #34

model and training code for the AR variant #34

Comments

MikeWangWZHL commented Sep 10, 2024

LTH14 commented Sep 10, 2024

shaochenze commented Sep 20, 2024 • edited Loading

LTH14 commented Sep 20, 2024 • edited Loading

shaochenze commented Sep 20, 2024

LTH14 commented Sep 20, 2024

shaochenze commented Sep 20, 2024 •

edited

Loading

LTH14 commented Sep 20, 2024 •

edited

Loading