Zipformer with Adam optimizer #1708

zhu-han · 2024-08-01T12:34:40Z

This PR adds a recipe of training Zipformer with Adam optimizer. The goal is to help people integrate Zipformer encoder into their own models trained with Adam optimizer.

To make Zipformer compatible with Adam, there are several changes compared with the original Zipformer recipe:

Replace ScaledAdam with Adam,
Remove balancer and whitener modules,
Replace all ScaledLinear with nn.Linear,
Replace Eden with Noam learning rate scheduler,
Replace SwooshR and SwooshL with Swish activation function,
Add an additional BiasNorm in each module (feedforward, attention and convolution),
Multiply the attention score with the scaling factor d**-0.5.

The results are as follows:

normal-scaled model, number of model parameters: 65595219, i.e., 65.60 M

decoding method	test-clean	test-other	comment
greedy_search	2.35	5.53	--epoch 70 --avg 30
modified_beam_search	2.29	5.48	--epoch 70 --avg 30
fast_beam_search	2.31	5.52	--epoch 70 --avg 30

large-scaled model, number of model parameters: 148514478, i.e., 148.5 M

decoding method	test-clean	test-other	comment
greedy_search	2.27	5.25	--epoch 70 --avg 20
modified_beam_search	2.23	5.17	--epoch 70 --avg 20
fast_beam_search	2.24	5.2	--epoch 70 --avg 20

Note that Zipformer with ScaledAdam performs better than the Zipformer with Adam.

Zipformer with Adam optimizer

db38ab0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zipformer with Adam optimizer #1708

Zipformer with Adam optimizer #1708

zhu-han commented Aug 1, 2024 •

edited

Loading

Zipformer with Adam optimizer #1708

Are you sure you want to change the base?

Zipformer with Adam optimizer #1708

Conversation

zhu-han commented Aug 1, 2024 • edited Loading

zhu-han commented Aug 1, 2024 •

edited

Loading