Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between Figure 3a and Equation 5 in paper #2

Open
krasserm opened this issue Nov 2, 2022 · 1 comment
Open

Mismatch between Figure 3a and Equation 5 in paper #2

krasserm opened this issue Nov 2, 2022 · 1 comment

Comments

@krasserm
Copy link

krasserm commented Nov 2, 2022

Thank you for the very interesting paper and your plan to release the code. Since there is no initial code release yet (at the time of opening this issue), I have an implementation-related question: the lightweight transformer layer ${\theta}$ is defined in Equation 5 as

$U' = \text{SA}(U) + \text{LN}(U)$
$\hat{U} = \text{FFN}(\text{LN}(U')) + \text{LN(U')}$

whereas Figure 3a looks more like

$U' = \text{LN}(\text{SA}(U) + U)$
$\hat{U} = \text{LN}(\text{FFN}(U') + U')$

Which one is correct i.e. is used in the implementation?

@andydelworth
Copy link

I am also very interested in the answer to this question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants