Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fine-tuning issue #212

Open
zhoufqing opened this issue Sep 23, 2023 · 1 comment
Open

fine-tuning issue #212

zhoufqing opened this issue Sep 23, 2023 · 1 comment

Comments

@zhoufqing
Copy link

When I use LJSpeech's 900000. pth. tar as a pre training model and fine tune my own data, I use model.load_state_dict(torch.load('./output/ckpt/LJSpeech/900000.pth.tar')) to load the pre trained model, but an error occurred in the subsequent code
RuntimeError: Error(s) in loading state_dict for FastSpeech2:
Missing key(s) in state_dict: "encoder.position_enc", "encoder.src_word_emb.weight", "encoder.layer_stack.0.slf_attn.w_qs.weight", "encoder.layer_stack.0.slf_attn.w_qs.bias", "encoder.layer_stack.0.slf_attn.w_ks.weight", "encoder.layer_stack.0.slf_attn.w_ks.bias", "encoder.layer_stack.0.slf_attn.w_vs.weight", "encoder.layer_stack.0.slf_attn.w_vs.bias", "encoder.layer_stack.0.slf_attn.layer_norm.weight", "encoder.layer_stack.0.slf_attn.layer_norm.bias", "encoder.layer_stack.0.slf_attn.fc.weight", "encoder.layer_stack.0.slf_attn.fc.bias", "encoder.layer_stack.0.pos_ffn.w_1.weight", "encoder.layer_stack.0.pos_ffn.w_1.bias", "encoder.layer_stack.0.pos_ffn.w_2.weight", "encoder.layer_stack.0.pos_ffn.w_2.bias", "encoder.layer_stack.0.pos_ffn.layer_norm.weight", "encoder.layer_stack.0.pos_ffn.layer_norm.bias", "encoder.layer_stack.1.slf_attn.w_qs.weight", "encoder.layer_stack.1.slf_attn.w_qs.bias", "encoder.layer_stack.1.slf_attn.w_ks.weight", "encoder.layer_stack.1.slf_attn.w_ks.bias", "encoder.layer_stack.1.slf_attn.w_vs.weight", "encoder.layer_stack.1.slf_attn.w_vs.bias", "encoder.layer_stack.1.slf_attn.layer_norm.weight", "encoder.layer_stack.1.slf_attn.layer_norm.bias", "encoder.layer_stack.1.slf_attn.fc.weight", "encoder.layer_stack.1.slf_attn.fc.bias", "encoder.layer_stack.1.pos_ffn.w_1.weight", "encoder.layer_stack.1.pos_ffn.w_1.bias", "encoder.layer_stack.1.pos_ffn.w_2.weight", "encoder.layer_stack.1.pos_ffn.w_2.bias", "encoder.layer_stack.1.pos_ffn.layer_norm.weight", "encoder.layer_stack.1.pos_ffn.layer_norm.bias", "encoder.layer_stack.2.slf_attn.w_qs.weight", "encoder.layer_stack.2.slf_attn.w_qs.bias", "encoder.layer_stack.2.slf_attn.w_ks.weight", "encoder.layer_stack.2.slf_attn.w_ks.bias", "encoder.layer_stack.2.slf_attn.w_vs.weight", "encoder.layer_stack.2.slf_attn.w_vs.bias", "encoder.layer_stack.2.slf_attn.layer_norm.weight", "encoder.layer_stack.2.slf_attn.layer_norm.bias", "encoder.layer_stack.2.slf_attn.fc.weight", "encoder.layer_stack.2.slf_attn.fc.bias", "encoder.layer_stack.2.pos_ffn.w_1.weight", "encoder.layer_stack.2.pos_ffn.w_1.bias", "encoder.layer_stack.2.pos_ffn.w_2.weight", "encoder.layer_stack.2.pos_ffn.w_2.bias", "encoder.layer_stack.2.pos_ffn.layer_norm.weight", "encoder.layer_stack.2.pos_ffn.layer_norm.bias", "encoder.layer_stack.3.slf_attn.w_qs.weight", "encoder.layer_stack.3.slf_attn.w_qs.bias", "encoder.layer_stack.3.slf_attn.w_ks.weight", "encoder.layer_stack.3.slf_attn.w_ks.bias", "encoder.layer_stack.3.slf_attn.w_vs.weight", "encoder.layer_stack.3.slf_attn.w_vs.bias", "encoder.layer_stack.3.slf_attn.layer_norm.weight", "encoder.layer_stack.3.slf_attn.layer_norm.bias", "encoder.layer_stack.3.slf_attn.fc.weight", "encoder.layer_stack.3.slf_attn.fc.bias", "encoder.layer_stack.3.pos_ffn.w_1.weight", "encoder.layer_stack.3.pos_ffn.w_1.bias", "encoder.layer_stack.3.pos_ffn.w_2.weight", "encoder.layer_stack.3.pos_ffn.w_2.bias", "encoder.layer_stack.3.pos_ffn.layer_norm.weight", "encoder.layer_stack.3.pos_ffn.layer_norm.bias", "variance_adaptor.pitch_bins", "variance_adaptor.energy_bins", "variance_adaptor.duration_predictor.conv_layer.conv1d_1.conv.weight", "variance_adaptor.duration_predictor.conv_layer.conv1d_1.conv.bias", "variance_adaptor.duration_predictor.conv_layer.layer_norm_1.weight", "variance_adaptor.duration_predictor.conv_layer.layer_norm_1.bias", "variance_adaptor.duration_predictor.conv_layer.conv1d_2.conv.weight", "variance_adaptor.duration_predictor.conv_layer.conv1d_2.conv.bias", "variance_adaptor.duration_predictor.conv_layer.layer_norm_2.weight", "variance_adaptor.duration_predictor.conv_layer.layer_norm_2.bias", "variance_adaptor.duration_predictor.linear_layer.weight", "variance_adaptor.duration_predictor.linear_layer.bias", "variance_adaptor.pitch_predictor.conv_layer.conv1d_1.conv.weight", "variance_adaptor.pitch_predictor.conv_layer.conv1d_1.conv.bias", "variance_adaptor.pitch_predictor.conv_layer.layer_norm_1.weight", "variance_adaptor.pitch_predictor.conv_layer.layer_norm_1.bias", "variance_adaptor.pitch_predictor.conv_layer.conv1d_2.conv.weight", "variance_adaptor.pitch_predictor.conv_layer.conv1d_2.conv.bias", "variance_adaptor.pitch_predictor.conv_layer.layer_norm_2.weight", "variance_adaptor.pitch_predictor.conv_layer.layer_norm_2.bias", "variance_adaptor.pitch_predictor.linear_layer.weight", "variance_adaptor.pitch_predictor.linear_layer.bias", "variance_adaptor.energy_predictor.conv_layer.conv1d_1.conv.weight", "variance_adaptor.energy_predictor.conv_layer.conv1d_1.conv.bias", "variance_adaptor.energy_predictor.conv_layer.layer_norm_1.weight", "variance_adaptor.energy_predictor.conv_layer.layer_norm_1.bias", "variance_adaptor.energy_predictor.conv_layer.conv1d_2.conv.weight", "variance_adaptor.energy_predictor.conv_layer.conv1d_2.conv.bias", "variance_adaptor.energy_predictor.conv_layer.layer_norm_2.weight", "variance_adaptor.energy_predictor.conv_layer.layer_norm_2.bias", "variance_adaptor.energy_predictor.linear_layer.weight", "variance_adaptor.energy_predictor.linear_layer.bias", "variance_adaptor.pitch_embedding.weight", "variance_adaptor.energy_embedding.weight", "decoder.position_enc", "decoder.layer_stack.0.slf_attn.w_qs.weight", "decoder.layer_stack.0.slf_attn.w_qs.bias", "decoder.layer_stack.0.slf_attn.w_ks.weight", "decoder.layer_stack.0.slf_attn.w_ks.bias", "decoder.layer_stack.0.slf_attn.w_vs.weight", "decoder.layer_stack.0.slf_attn.w_vs.bias", "decoder.layer_stack.0.slf_attn.layer_norm.weight", "decoder.layer_stack.0.slf_attn.layer_norm.bias", "decoder.layer_stack.0.slf_attn.fc.weight", "decoder.layer_stack.0.slf_attn.fc.bias", "decoder.layer_stack.0.pos_ffn.w_1.weight", "decoder.layer_stack.0.pos_ffn.w_1.bias", "decoder.layer_stack.0.pos_ffn.w_2.weight", "decoder.layer_stack.0.pos_ffn.w_2.bias", "decoder.layer_stack.0.pos_ffn.layer_norm.weight", "decoder.layer_stack.0.pos_ffn.layer_norm.bias", "decoder.layer_stack.1.slf_attn.w_qs.weight", "decoder.layer_stack.1.slf_attn.w_qs.bias", "decoder.layer_stack.1.slf_attn.w_ks.weight", "decoder.layer_stack.1.slf_attn.w_ks.bias", "decoder.layer_stack.1.slf_attn.w_vs.weight", "decoder.layer_stack.1.slf_attn.w_vs.bias", "decoder.layer_stack.1.slf_attn.layer_norm.weight", "decoder.layer_stack.1.slf_attn.layer_norm.bias", "decoder.layer_stack.1.slf_attn.fc.weight", "decoder.layer_stack.1.slf_attn.fc.bias", "decoder.layer_stack.1.pos_ffn.w_1.weight", "decoder.layer_stack.1.pos_ffn.w_1.bias", "decoder.layer_stack.1.pos_ffn.w_2.weight", "decoder.layer_stack.1.pos_ffn.w_2.bias", "decoder.layer_stack.1.pos_ffn.layer_norm.weight", "decoder.layer_stack.1.pos_ffn.layer_norm.bias", "decoder.layer_stack.2.slf_attn.w_qs.weight", "decoder.layer_stack.2.slf_attn.w_qs.bias", "decoder.layer_stack.2.slf_attn.w_ks.weight", "decoder.layer_stack.2.slf_attn.w_ks.bias", "decoder.layer_stack.2.slf_attn.w_vs.weight", "decoder.layer_stack.2.slf_attn.w_vs.bias", "decoder.layer_stack.2.slf_attn.layer_norm.weight", "decoder.layer_stack.2.slf_attn.layer_norm.bias", "decoder.layer_stack.2.slf_attn.fc.weight", "decoder.layer_stack.2.slf_attn.fc.bias", "decoder.layer_stack.2.pos_ffn.w_1.weight", "decoder.layer_stack.2.pos_ffn.w_1.bias", "decoder.layer_stack.2.pos_ffn.w_2.weight", "decoder.layer_stack.2.pos_ffn.w_2.bias", "decoder.layer_stack.2.pos_ffn.layer_norm.weight", "decoder.layer_stack.2.pos_ffn.layer_norm.bias", "decoder.layer_stack.3.slf_attn.w_qs.weight", "decoder.layer_stack.3.slf_attn.w_qs.bias", "decoder.layer_stack.3.slf_attn.w_ks.weight", "decoder.layer_stack.3.slf_attn.w_ks.bias", "decoder.layer_stack.3.slf_attn.w_vs.weight", "decoder.layer_stack.3.slf_attn.w_vs.bias", "decoder.layer_stack.3.slf_attn.layer_norm.weight", "decoder.layer_stack.3.slf_attn.layer_norm.bias", "decoder.layer_stack.3.slf_attn.fc.weight", "decoder.layer_stack.3.slf_attn.fc.bias", "decoder.layer_stack.3.pos_ffn.w_1.weight", "decoder.layer_stack.3.pos_ffn.w_1.bias", "decoder.layer_stack.3.pos_ffn.w_2.weight", "decoder.layer_stack.3.pos_ffn.w_2.bias", "decoder.layer_stack.3.pos_ffn.layer_norm.weight", "decoder.layer_stack.3.pos_ffn.layer_norm.bias", "decoder.layer_stack.4.slf_attn.w_qs.weight", "decoder.layer_stack.4.slf_attn.w_qs.bias", "decoder.layer_stack.4.slf_attn.w_ks.weight", "decoder.layer_stack.4.slf_attn.w_ks.bias", "decoder.layer_stack.4.slf_attn.w_vs.weight", "decoder.layer_stack.4.slf_attn.w_vs.bias", "decoder.layer_stack.4.slf_attn.layer_norm.weight", "decoder.layer_stack.4.slf_attn.layer_norm.bias", "decoder.layer_stack.4.slf_attn.fc.weight", "decoder.layer_stack.4.slf_attn.fc.bias", "decoder.layer_stack.4.pos_ffn.w_1.weight", "decoder.layer_stack.4.pos_ffn.w_1.bias", "decoder.layer_stack.4.pos_ffn.w_2.weight", "decoder.layer_stack.4.pos_ffn.w_2.bias", "decoder.layer_stack.4.pos_ffn.layer_norm.weight", "decoder.layer_stack.4.pos_ffn.layer_norm.bias", "decoder.layer_stack.5.slf_attn.w_qs.weight", "decoder.layer_stack.5.slf_attn.w_qs.bias", "decoder.layer_stack.5.slf_attn.w_ks.weight", "decoder.layer_stack.5.slf_attn.w_ks.bias", "decoder.layer_stack.5.slf_attn.w_vs.weight", "decoder.layer_stack.5.slf_attn.w_vs.bias", "decoder.layer_stack.5.slf_attn.layer_norm.weight", "decoder.layer_stack.5.slf_attn.layer_norm.bias", "decoder.layer_stack.5.slf_attn.fc.weight", "decoder.layer_stack.5.slf_attn.fc.bias", "decoder.layer_stack.5.pos_ffn.w_1.weight", "decoder.layer_stack.5.pos_ffn.w_1.bias", "decoder.layer_stack.5.pos_ffn.w_2.weight", "decoder.layer_stack.5.pos_ffn.w_2.bias", "decoder.layer_stack.5.pos_ffn.layer_norm.weight", "decoder.layer_stack.5.pos_ffn.layer_norm.bias", "mel_linear.weight", "mel_linear.bias", "postnet.convolutions.0.0.conv.weight", "postnet.convolutions.0.0.conv.bias", "postnet.convolutions.0.1.weight", "postnet.convolutions.0.1.bias", "postnet.convolutions.0.1.running_mean", "postnet.convolutions.0.1.running_var", "postnet.convolutions.1.0.conv.weight", "postnet.convolutions.1.0.conv.bias", "postnet.convolutions.1.1.weight", "postnet.convolutions.1.1.bias", "postnet.convolutions.1.1.running_mean", "postnet.convolutions.1.1.running_var", "postnet.convolutions.2.0.conv.weight", "postnet.convolutions.2.0.conv.bias", "postnet.convolutions.2.1.weight", "postnet.convolutions.2.1.bias", "postnet.convolutions.2.1.running_mean", "postnet.convolutions.2.1.running_var", "postnet.convolutions.3.0.conv.weight", "postnet.convolutions.3.0.conv.bias", "postnet.convolutions.3.1.weight", "postnet.convolutions.3.1.bias", "postnet.convolutions.3.1.running_mean", "postnet.convolutions.3.1.running_var", "postnet.convolutions.4.0.conv.weight", "postnet.convolutions.4.0.conv.bias", "postnet.convolutions.4.1.weight", "postnet.convolutions.4.1.bias", "postnet.convolutions.4.1.running_mean", "postnet.convolutions.4.1.running_var".
Unexpected key(s) in state_dict: "model", "optimizer".

@melodyze-ai
Copy link

You are facing this issue since there are two states saved in this checkpoint. The model and the optimizer. To fine tune you need to load only the model component and it should work fine. As an experiment you can use a jupyter notebook to load the 900000.pth.tar and visualize it. That'll give you more clarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants