-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reasoning / Intuition behind Model Averaging #715
Comments
Please have a look at |
I see. So in effect, Icefall is already averaging over every mini-batch (approximated by sampling every Thank you so much for pointing me towards the correct direction and for connecting the dots for me. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is there any reason or intuition for why the models are averaged in checkpoint.py by using this formula, instead of averaging over a weighted sum?
(I just copied the equation from
icefall/icefall/checkpoint.py
Line 400 in 1d5c03f
batch_idx
and m refers to model params.)It seemed quite counterintuitive to me that the earlier model is subtracted from a later model, even if indeed the weights for$m_{\text{end}}$ and $m_{\text{start}}$ do sum to 1.
Is this actually an established model averaging method..? Or, was there any Github discussion regarding this model averaging method?
Thanks!
The text was updated successfully, but these errors were encountered: