Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zipformer with Adam optimizer #1708

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zhu-han
Copy link

@zhu-han zhu-han commented Aug 1, 2024

This PR adds a recipe of training Zipformer with Adam optimizer. The goal is to help people integrate Zipformer encoder into their own models trained with Adam optimizer.

To make Zipformer compatible with Adam, there are several changes compared with the original Zipformer recipe:

  1. Replace ScaledAdam with Adam,
  2. Remove balancer and whitener modules,
  3. Replace all ScaledLinear with nn.Linear,
  4. Replace Eden with Noam learning rate scheduler,
  5. Replace SwooshR and SwooshL with Swish activation function,
  6. Add an additional BiasNorm in each module (feedforward, attention and convolution),
  7. Multiply the attention score with the scaling factor d**-0.5.

The results are as follows:

  • normal-scaled model, number of model parameters: 65595219, i.e., 65.60 M
decoding method test-clean test-other comment
greedy_search 2.35 5.53 --epoch 70 --avg 30
modified_beam_search 2.29 5.48 --epoch 70 --avg 30
fast_beam_search 2.31 5.52 --epoch 70 --avg 30
  • large-scaled model, number of model parameters: 148514478, i.e., 148.5 M
decoding method test-clean test-other comment
greedy_search 2.27 5.25 --epoch 70 --avg 20
modified_beam_search 2.23 5.17 --epoch 70 --avg 20
fast_beam_search 2.24 5.2 --epoch 70 --avg 20

Note that Zipformer with ScaledAdam performs better than the Zipformer with Adam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant