Reasoning / Intuition behind Model Averaging #715

teowenshen · 2022-11-29T12:30:24Z

Is there any reason or intuition for why the models are averaged in checkpoint.py by using this formula, instead of averaging over a weighted sum?

(I just copied the equation from

icefall/icefall/checkpoint.py

Line 400 in 1d5c03f

(1) avg = (model_end * end - model_start * start) / interval.

, where b refers to batch_idx and m refers to model params.)

It seemed quite counterintuitive to me that the earlier model is subtracted from a later model, even if indeed the weights for $m_{\text{end}}$ and $m_{\text{start}}$ do sum to 1.

Is this actually an established model averaging method..? Or, was there any Github discussion regarding this model averaging method?

Thanks!

The text was updated successfully, but these errors were encountered:

csukuangfj · 2022-11-29T12:31:49Z

Please have a look at

yaozengwei · 2022-11-29T12:54:01Z

In training, we maintain the model_avg, which is the average of all periodically sampled models (model_1, model_2,...,model_n) from the start:

During decoding, we want to use the averaged model at the interval [p+1, p+2, ..., q]:

teowenshen · 2022-11-29T13:23:10Z

I see. So in effect, Icefall is already averaging over every mini-batch (approximated by sampling every params.average_period) along the way during training, which is why the actual averaging itself for decoding is just a subtraction of the earlier models.

Thank you so much for pointing me towards the correct direction and for connecting the dots for me.

teowenshen closed this as completed Nov 29, 2022

yaozengwei mentioned this issue Feb 24, 2023

weird loss curve when finetuning with gigaspeech pretrained k2 model #925

Closed

csukuangfj mentioned this issue Apr 13, 2023

the average of the last few models #1000

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reasoning / Intuition behind Model Averaging #715

Reasoning / Intuition behind Model Averaging #715

teowenshen commented Nov 29, 2022

csukuangfj commented Nov 29, 2022

yaozengwei commented Nov 29, 2022

teowenshen commented Nov 29, 2022

Reasoning / Intuition behind Model Averaging #715

Reasoning / Intuition behind Model Averaging #715

Comments

teowenshen commented Nov 29, 2022

csukuangfj commented Nov 29, 2022

yaozengwei commented Nov 29, 2022

teowenshen commented Nov 29, 2022