Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reasoning / Intuition behind Model Averaging #715

Closed
teowenshen opened this issue Nov 29, 2022 · 3 comments
Closed

Reasoning / Intuition behind Model Averaging #715

teowenshen opened this issue Nov 29, 2022 · 3 comments

Comments

@teowenshen
Copy link
Contributor

Is there any reason or intuition for why the models are averaged in checkpoint.py by using this formula, instead of averaging over a weighted sum?

image
(I just copied the equation from

(1) avg = (model_end * end - model_start * start) / interval.
, where b refers to batch_idx and m refers to model params.)

It seemed quite counterintuitive to me that the earlier model is subtracted from a later model, even if indeed the weights for $m_{\text{end}}$ and $m_{\text{start}}$ do sum to 1.

Is this actually an established model averaging method..? Or, was there any Github discussion regarding this model averaging method?

Thanks!

@csukuangfj
Copy link
Collaborator

Please have a look at

@yaozengwei
Copy link
Collaborator

In training, we maintain the model_avg, which is the average of all periodically sampled models (model_1, model_2,...,model_n) from the start:
Screen Shot 2022-11-29 at 20 41 15
During decoding, we want to use the averaged model at the interval [p+1, p+2, ..., q]:
Screen Shot 2022-11-29 at 20 48 52

@teowenshen
Copy link
Contributor Author

I see. So in effect, Icefall is already averaging over every mini-batch (approximated by sampling every params.average_period) along the way during training, which is why the actual averaging itself for decoding is just a subtraction of the earlier models.

Thank you so much for pointing me towards the correct direction and for connecting the dots for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants