Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG🐛] Size match in converted xttsv2 models #43

Closed
scruffynerf opened this issue Dec 18, 2024 · 7 comments
Closed

[BUG🐛] Size match in converted xttsv2 models #43

scruffynerf opened this issue Dec 18, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@scruffynerf
Copy link

Bug Description

[rank0]:   File "mypath/lib/python3.10/site-packages/auralis/core/tts.py", line 85, in _load_model
[rank0]:     return MODEL_REGISTRY[config['model_type']].from_pretrained(model_name_or_path, **kwargs)
[rank0]:   File "mypath/lib/python3.10/site-packages/auralis/models/xttsv2/XTTSv2.py", line 299, in from_pretrained
[rank0]:     model.load_state_dict(hifigan_state)
[rank0]:   File "mypath/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2584, in load_state_dict
[rank0]:     raise RuntimeError(
[rank0]: RuntimeError: Error(s) in loading state_dict for XTTSv2Engine:
[rank0]: 	size mismatch for text_embedding.weight: copying a param with shape torch.Size([6153, 1024]) from checkpoint, the shape in current model is torch.Size([6681, 1024]).
[rank0]: 	size mismatch for text_head.weight: copying a param with shape torch.Size([6153, 1024]) from checkpoint, the shape in current model is torch.Size([6681, 1024]).
[rank0]: 	size mismatch for text_head.bias: copying a param with shape torch.Size([6153]) from checkpoint, the shape in current model is torch.Size([6681]).

## Minimal Reproducible Example

use the current converter script with either
HF's drewThomasson/Morgan_freeman_xtts_model
or
HF's scruffynerf/xtts-vincent

(both of these work, and were trained using https://github.com/daswer123/xtts-finetune-webui )

and then try to use/load the resulting converted files

@scruffynerf scruffynerf added the bug Something isn't working label Dec 18, 2024
@scruffynerf
Copy link
Author

scruffynerf commented Dec 18, 2024

Ah ha, figured it out.

Coqui xtts2 v2.0.2 differs from v2.0.3 in the # of tokens

https://huggingface.co/coqui/XTTS-v2/commit/6b8036b35d787cf43d18d640587956b9db8fd1b8

the above models were training on v2.0.2

The convertor script needs to be aware of this, since any difference will cause it to not work once converted, since the config/etc don't match the actual trained gpt section of the model

Correct me if I'm wrong, but basically, either this means the gpt config must be adjusted in this case, since it no longer matches the stock config/etc. OR you should just fail the convertor, and complain that only v2.0.3 models can be converted.

@C00reNUT
Copy link
Contributor

C00reNUT commented Dec 19, 2024

same issue here with 2.0.0 model version used for training, this would also maybe explain the difference in quality/output #27 when I am converting coqui 2.0.0 model using provided script...

@elvinzade
Copy link

Error

[rank0]: raise RuntimeError(
[rank0]: RuntimeError: Error(s) in loading state_dict for XTTSv2Engine:
[rank0]: size mismatch for text_embedding.weight: copying a param with shape torch.Size([6681, 1024]) from checkpoint, the shape in current model is torch.Size([8155, 1024]).
[rank0]: size mismatch for text_head.weight: copying a param with shape torch.Size([6681, 1024]) from checkpoint, the shape in current model is torch.Size([8155, 1024]).
[rank0]: size mismatch for text_head.bias: copying a param with shape torch.Size([6681]) from checkpoint, the shape in current model is torch.Size([8155]).

Explanation

same issue here i trained for a new languague when i run checkpoint_converter.py script it download json file for xtts config and gpt tokenizer config then i update json file with new languague vocabulary. But i also get following error.

@mlinmg
Copy link
Contributor

mlinmg commented Dec 23, 2024

Cool I didn't knew about this, I'll be looking into it
@C00reNUT did you still face different quality with the new model conversion script? there was a typo so it was actually overwriting the converted checkpoints with the default ones, we think that was the cause of error

@C00reNUT
Copy link
Contributor

@mlinmg I have tried the new conversion script, but after conversion I needed manually replace the tokenizer from version 2.0.3 to 2.0.0 which is the model version I used for finetuning. And adjust the setting accordingly to the 2.0.0. repo. The quality is much better but it is still worse than in original...

I couldn't figure it why maybe you are calculating latents from reference differently or something. Or I am missing some settings that are different between implementations...

@elvinzade you also need to change the tokenizer size in all of the config.json and some python files where it is referenced, just search for the number in all files and change to the version corresponding to the coqui model repo version on huggingface

@scruffynerf
Copy link
Author

I don't believe it's quite a drop-in 'downgrade' to go backwards.

@elvinzade
Copy link

@C00reNUT thanks for your proposal. I appreciate your proposal regarding the Coqui model. However, I've trained a new language that is not part of the languages introduced in the official Coqui model repository. During this process, I extended the tokenizer vocabulary size to accommodate the new language. The model should work seamlessly with the extended tokenizer and support the newly added language. If I attempt to adjust the tokenizer size, I lose the ability to use the newly added language feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants