-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model and training code for the AR variant #34
Comments
To keep this repo clean, we don't have a plan to release the AR code in this repo. However, it is very easy to reimplement it using the current repo -- almost all hyper-parameters remain the same as MAR. The only difference is the causal attention mask and the teacher-forcing loss. |
Hi @LTH14, in the AR variant, is it necessary for the attention mechanism within the MAE encoder to be causal? Alternatively, should we consider removing the MAE encoder altogether in this variant? |
In the AR variant, we don't need the MAE encoder. A single causal decoder is enough (similar to GPT). |
Thanks! Do you double the depth of MAE decoder? |
Yes we keep the total number of parameters unchanged |
Thanks for open-sourcing this amazing project!
I wonder it is possible to also release model and training code for the AR baseline
Thank you in advance!
The text was updated successfully, but these errors were encountered: