-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newly trained model since Lightning implementation sounds robotic early on #270
Comments
Even after around 5000 steps it still sounds very robotic. I applied the fix in the commit. Original: Generated: |
same |
Does the above PR fix this issue? |
Yup! That indeed helped with 0 steps already (as in, the initial conversion / preparation) Original: Generated (0 steps): Generated (~1700 steps): Additionally, I stopped the training after that first checkpoint save, and started it again now. Letting it run for a bit, it seemed to save at around 891 steps instead of 1770. |
I left it until 100k steps to see if it would gradually fix up.. Big mistake, still sounds robotic! Hopefully this is fixed now. |
😇 |
How many gpus are connected to your pc? |
A single one, NVIDIA RTX 3090. With about 170 audio clips in the dataset I can get around 1.5 iterations per second (which for my use cases is more than enough) Also, is your quote with the 😇 positive because I provide more details to the issue? 😁 |
In Lightning, global_steps seems to be added every time optimizer.step is called.... |
Could this be helpful in fixing it? Lightning-AI/pytorch-lightning#17281 |
It is more complicated than I thought, it is also related to tensorboard |
I see, if you have any more things I should test out please tell me! 😁 |
Updating everything, but mine sounds like this: I'm using pre-resample, pre-config. and svc pre-hubert -fm crepe (I've also tried just svc pre-hubert). My training set consists of 263 files that range from 1-14 seconds each, adding up to 20 minutes and 36 seconds of data. I'm hesitant to train it further because I'm afraid that it might result in the same robotic voice I experienced the last time I trained it for 100k steps. It's possible that I'm mistaken, but I'm convinced that everything was working perfectly just a a week or two ago. I even tried a complete reinstall, but unfortunately, it didn't fix the issue. Any idea what I'm doing wrong? |
Might still need additional fix, progress bar displays incomplete |
As you can see, I just fixed this issue right now |
Just updated to 3.1.6, but getting this error: Training: 0it [00:00, ?it/s]Traceback (most recent call last): |
Meanwhile my generated one after 1600 steps (I merged the separate commits and didn't include the final one to fix the naming) seems to still be very metallic... However, it does take the checkpoint into account, so that problem seems to be solved. However, this does feel a bit.. strange.. Additionally, I found another part of the code where Otherwise it will show the generated audio as 1599, 2399, etc. First run also doesn't seem to be generating audio logs anymore now, I have to wait until an epoch / log interval is hit Yep, even after around 4000 steps it still sounds very bad. Pre-lightning didn't sound this bad :( Went back to 3.0.5 as a test. Training the model from scratch. Already seeing a massive drop in speed (1.4 iterations per second vs. 6 seconds per iteration) However, first generation at 800 sounds MUCH better: |
anyone can send a pr |
I updated my comment just now with a bit more info. 3.0.5 seems to yield better results than the current Lightning implementation. However, since it is such a big change, I wouldn't even know what changed under the hood or what might need to be tweaked to give better results in Lightning, too... As in, which parameters and so on |
Line 184 in train.py
This fixed it for me. |
Good addition, doesn't hurt to have As for the weird metallic noise, I think I've managed to fix it by shifting some function calls around. I removed one Additionally, the following is the original code for optimization:
According to the Lightning.AI documentation, it should be as follow:
(Zero grad, manual backward, then step) This brings the speed down a bit more (1.33it/s -> 1.27it/s) but I believe it's the correct way to do it. Example of just 200 iterations on the LJ Speech dataset (~500 clips, even though it has WAY more): I'll prep a PR in a second |
@Lordmau5 so now the latest version better than 3.0.5 or not? |
i mean quality, not speed |
I don't have any models from before 3.1.0 anymore (I redid them all since they were only like 4-5k iterations, so not far in, but they were already pretty good then) It all depends on the dataset you supply. If the dataset is clean audio, you shouldn't even need a lot (I trained the 2 voices of the Donkey Kong Rap, with 30s and 60s per voice respectively, and since they were pretty clear they give good results) |
Describe the bug
Started training a new model in 3.1.4 and noticed it sounded super robotic around 1700 steps in.
(Converted to webm so I could upload them here)
Original:
original.webm
Generated:
generated_1700.webm
To Reproduce
Start generating a new model with 3.1.4
Additional context
The setup (pre-resample, pre-config, pre-hubert) has been done with their default settings.
I saw this commit that fixed the order of the _d optimizer. I've made these changes locally myself and gonna report back on the model once I trained it from scratch again
13d6346
Additionally, I have no idea if this is intentional and it just needs more iterations.
Does it perhaps not take the legacy checkpoint into account anymore?
The text was updated successfully, but these errors were encountered: