Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newly trained model since Lightning implementation sounds robotic early on #270

Closed
Lordmau5 opened this issue Apr 9, 2023 · 26 comments · Fixed by #271
Closed

Newly trained model since Lightning implementation sounds robotic early on #270

Lordmau5 opened this issue Apr 9, 2023 · 26 comments · Fixed by #271
Labels
bug Something isn't working

Comments

@Lordmau5
Copy link
Collaborator

Lordmau5 commented Apr 9, 2023

Describe the bug
Started training a new model in 3.1.4 and noticed it sounded super robotic around 1700 steps in.

(Converted to webm so I could upload them here)
Original:
original.webm

Generated:
generated_1700.webm

To Reproduce
Start generating a new model with 3.1.4

Additional context
The setup (pre-resample, pre-config, pre-hubert) has been done with their default settings.

I saw this commit that fixed the order of the _d optimizer. I've made these changes locally myself and gonna report back on the model once I trained it from scratch again
13d6346

Additionally, I have no idea if this is intentional and it just needs more iterations.
Does it perhaps not take the legacy checkpoint into account anymore?

@Lordmau5 Lordmau5 added the bug Something isn't working label Apr 9, 2023
@Lordmau5
Copy link
Collaborator Author

Lordmau5 commented Apr 9, 2023

Even after around 5000 steps it still sounds very robotic. I applied the fix in the commit.

Original:
original_2.webm

Generated:
generated_5000.webm

@tonyco82
Copy link

tonyco82 commented Apr 9, 2023

same

@34j
Copy link
Collaborator

34j commented Apr 9, 2023

Does the above PR fix this issue?

@Lordmau5
Copy link
Collaborator Author

Lordmau5 commented Apr 9, 2023

Yup! That indeed helped with 0 steps already (as in, the initial conversion / preparation)
It still seems like there's a bit of metallic shrill, similar to another issue, but I assume that would go away after more training

Original:
original_3.webm

Generated (0 steps):
generated_3_0.webm

Generated (~1700 steps):
generated_3_1700.webm

Additionally, I stopped the training after that first checkpoint save, and started it again now.
It said it would start at epoch 79, however, it saved the model and optimizer state at iteration 0 to logs\44k\G_0 and D_0, instead of 1770 (or whichever one it initially saved it to)

Letting it run for a bit, it seemed to save at around 891 steps instead of 1770.
Also, after stopping that, too, and restarting it, it went back to 0

@ne0escape
Copy link

ne0escape commented Apr 9, 2023

I left it until 100k steps to see if it would gradually fix up..

Big mistake, still sounds robotic!

Hopefully this is fixed now.

@34j
Copy link
Collaborator

34j commented Apr 9, 2023

Additionally

😇

@34j
Copy link
Collaborator

34j commented Apr 9, 2023

Yup! That indeed helped with 0 steps already (as in, the initial conversion / preparation)
It still seems like there's a bit of metallic shrill, similar to another issue, but I assume that would go away after more training

Original:
original_3.webm

Generated (0 steps):
generated_3_0.webm

Generated (~1700 steps):
generated_3_1700.webm

Additionally, I stopped the training after that first checkpoint save, and started it again now.
It said it would start at epoch 79, however, it saved the model and optimizer state at iteration 0 to logs\44k\G_0 and D_0, instead of 1770 (or whichever one it initially saved it to)

Letting it run for a bit, it seemed to save at around 891 steps instead of 1770.
Also, after stopping that, too, and restarting it, it went back to 0

How many gpus are connected to your pc?

@Lordmau5
Copy link
Collaborator Author

Lordmau5 commented Apr 9, 2023

How many gpus are connected to your pc?

A single one, NVIDIA RTX 3090. With about 170 audio clips in the dataset I can get around 1.5 iterations per second (which for my use cases is more than enough)

Also, is your quote with the 😇 positive because I provide more details to the issue? 😁

@34j
Copy link
Collaborator

34j commented Apr 9, 2023

In Lightning, global_steps seems to be added every time optimizer.step is called....
Lightning-AI/pytorch-lightning#13752

@Lordmau5
Copy link
Collaborator Author

Lordmau5 commented Apr 9, 2023

Could this be helpful in fixing it? Lightning-AI/pytorch-lightning#17281

@34j
Copy link
Collaborator

34j commented Apr 9, 2023

It is more complicated than I thought, it is also related to tensorboard

@Lordmau5
Copy link
Collaborator Author

Lordmau5 commented Apr 9, 2023

It is more complicated than I thought, it is also related to tensorboard

I see, if you have any more things I should test out please tell me! 😁

@34j 34j closed this as completed in #271 Apr 9, 2023
@ne0escape
Copy link

ne0escape commented Apr 9, 2023

Updating everything, but mine sounds like this:

Original
2200 Steps

I'm using pre-resample, pre-config. and svc pre-hubert -fm crepe (I've also tried just svc pre-hubert).

My training set consists of 263 files that range from 1-14 seconds each, adding up to 20 minutes and 36 seconds of data.

I'm hesitant to train it further because I'm afraid that it might result in the same robotic voice I experienced the last time I trained it for 100k steps. It's possible that I'm mistaken, but I'm convinced that everything was working perfectly just a a week or two ago. I even tried a complete reinstall, but unfortunately, it didn't fix the issue.

Any idea what I'm doing wrong?

@34j
Copy link
Collaborator

34j commented Apr 9, 2023

It is more complicated than I thought, it is also related to tensorboard

Might still need additional fix, progress bar displays incomplete

@34j
Copy link
Collaborator

34j commented Apr 9, 2023

Updating everything, but mine sounds like this:

Original 2200 Steps

I'm using pre-resample, pre-config. and svc pre-hubert -fm crepe (I've also tried just svc pre-hubert).

My training set consists of 263 files that range from 1-14 seconds each, adding up to 20 minutes and 36 seconds of data.

I'm hesitant to train it further because I'm afraid that it might result in the same robotic voice I experienced the last time I trained it for 100k steps. It's possible that I'm mistaken, but I'm convinced that everything was working perfectly just a a week or two ago. I even tried a complete reinstall, but unfortunately, it didn't fix the issue.

Any idea what I'm doing wrong?

As you can see, I just fixed this issue right now

@ne0escape
Copy link

Just updated to 3.1.6, but getting this error:

Training: 0it [00:00, ?it/s]Traceback (most recent call last):
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VitsLightning' object has no attribute '_temp_epoch'
Training: 0it [00:00, ?it/s]

@Lordmau5
Copy link
Collaborator Author

Lordmau5 commented Apr 9, 2023

Meanwhile my generated one after 1600 steps (I merged the separate commits and didn't include the final one to fix the naming) seems to still be very metallic...

However, it does take the checkpoint into account, so that problem seems to be solved.
generated_1599_v3_1_6.webm

However, this does feel a bit.. strange..
Lightning claims to be ~2x faster for training, but if the results aren't on-par with "1x" of the previous method then there's no real positive of this, correct?


Additionally, I found another part of the code where self.total_batch_idx should be used
https://github.com/34j/so-vits-svc-fork/blob/main/src/so_vits_svc_fork/train.py#L392
^ This change ends up resulting in "-1" instead of the correct step / id. Not sure where this is being

Otherwise it will show the generated audio as 1599, 2399, etc.
image

First run also doesn't seem to be generating audio logs anymore now, I have to wait until an epoch / log interval is hit


Yep, even after around 4000 steps it still sounds very bad. Pre-lightning didn't sound this bad :(


Went back to 3.0.5 as a test. Training the model from scratch. Already seeing a massive drop in speed (1.4 iterations per second vs. 6 seconds per iteration)

However, first generation at 800 sounds MUCH better:
generated_800_v305.webm
original_800_v305.webm

@34j
Copy link
Collaborator

34j commented Apr 9, 2023

anyone can send a pr

@Lordmau5
Copy link
Collaborator Author

Lordmau5 commented Apr 9, 2023

I updated my comment just now with a bit more info. 3.0.5 seems to yield better results than the current Lightning implementation.

However, since it is such a big change, I wouldn't even know what changed under the hood or what might need to be tweaked to give better results in Lightning, too... As in, which parameters and so on

@ne0escape
Copy link

ne0escape commented Apr 9, 2023

Just updated to 3.1.6, but getting this error:

Training: 0it [00:00, ?it/s]Traceback (most recent call last): raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'VitsLightning' object has no attribute '_temp_epoch' Training: 0it [00:00, ?it/s]

Line 184 in train.py

def __init__(self, reset_optimizer: bool = False, **hparams: Any):
    super().__init__()
    self._temp_epoch = 0  # Add this line to initialize the _temp_epoch attribute

This fixed it for me.

@Lordmau5
Copy link
Collaborator Author

Lordmau5 commented Apr 9, 2023

Just updated to 3.1.6, but getting this error:
Training: 0it [00:00, ?it/s]Traceback (most recent call last): raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'VitsLightning' object has no attribute '_temp_epoch' Training: 0it [00:00, ?it/s]

Line 184 in train.py

def __init__(self, reset_optimizer: bool = False, **hparams: Any):
    super().__init__()
    self._temp_epoch = 0  # Add this line to initialize the _temp_epoch attribute

This fixed it for me.

Good addition, doesn't hurt to have


As for the weird metallic noise, I think I've managed to fix it by shifting some function calls around.

I removed one with torch.no_grad(): call in train.py for the loss calculation.
It seems to slightly impact the speed (1.41it/s -> 1.33it/s) but I am getting a much better result.

Additionally, the following is the original code for optimization:

        # optimizer
        self.manual_backward(loss_gen_all)
        optim_g.step()
        optim_g.zero_grad()
        self.untoggle_optimizer(optim_g)

According to the Lightning.AI documentation, it should be as follow:

        # optimizer
        optim_g.zero_grad()
        self.manual_backward(loss_gen_all)
        optim_g.step()
        self.untoggle_optimizer(optim_g)

(Zero grad, manual backward, then step)
https://lightning.ai/docs/pytorch/stable/common/optimization.html#id2
https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#manual-backward

This brings the speed down a bit more (1.33it/s -> 1.27it/s) but I believe it's the correct way to do it.
Unless the order you do it in actually makes more sense for training? (Manual backward, step, then zero grad)

Example of just 200 iterations on the LJ Speech dataset (~500 clips, even though it has WAY more):
original_lj_speech.webm
generated_lj_speech_200.webm

I'll prep a PR in a second

@Lordmau5
Copy link
Collaborator Author

Lordmau5 commented Apr 9, 2023

That only leaves the issue of checkpoints saving twice, and I couldn't figure that out yet...
image

I just had a thought... could it be that, somehow, training is being run twice?
Or at the very least, the validation?

I can't seem to find a 2nd call to log_audio_dict besides in the validation_step, and we always get 2 audio files instead of just 1....

I added a LOG.info to the validation_step method and when booting up the model it does run it twice.
Once it runs it with batch_idx being 0, and once with it being 1


Yeah, finally got to a checkpoint saving point and it happened twice again...
image

Maybe adding a check for batch_idx == 0 might be an idea, and only then do the actual checkpoint stuff

Yup, that fixed it! Gonna add it into the PR!

@Meldoner
Copy link
Contributor

Meldoner commented Apr 11, 2023

@Lordmau5 so now the latest version better than 3.0.5 or not?

@Lordmau5
Copy link
Collaborator Author

@Lordmau5 so now the latest version better than 3.0.5 or not?

From what I can tell it trains faster thanks to Lightning.AI (the new implementation) and some fixes.
As well as the setting talked about here #288

At least in my testing, that is.

@Meldoner
Copy link
Contributor

i mean quality, not speed

@Lordmau5
Copy link
Collaborator Author

i mean quality, not speed

I don't have any models from before 3.1.0 anymore (I redid them all since they were only like 4-5k iterations, so not far in, but they were already pretty good then)

It all depends on the dataset you supply. If the dataset is clean audio, you shouldn't even need a lot (I trained the 2 voices of the Donkey Kong Rap, with 30s and 60s per voice respectively, and since they were pretty clear they give good results)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants