Newly trained model since Lightning implementation sounds robotic early on #270

Lordmau5 · 2023-04-09T10:47:50Z

Describe the bug
Started training a new model in 3.1.4 and noticed it sounded super robotic around 1700 steps in.

(Converted to webm so I could upload them here)
Original:
original.webm

To Reproduce
Start generating a new model with 3.1.4

Additional context
The setup (pre-resample, pre-config, pre-hubert) has been done with their default settings.

I saw this commit that fixed the order of the _d optimizer. I've made these changes locally myself and gonna report back on the model once I trained it from scratch again
13d6346

Additionally, I have no idea if this is intentional and it just needs more iterations.
Does it perhaps not take the legacy checkpoint into account anymore?

The text was updated successfully, but these errors were encountered:

Lordmau5 · 2023-04-09T11:20:44Z

Even after around 5000 steps it still sounds very robotic. I applied the fix in the commit.

Original:
original_2.webm

Generated:
generated_5000.webm

tonyco82 · 2023-04-09T11:22:16Z

same

34j · 2023-04-09T12:03:56Z

Does the above PR fix this issue?

Lordmau5 · 2023-04-09T12:23:42Z

Yup! That indeed helped with 0 steps already (as in, the initial conversion / preparation)
It still seems like there's a bit of metallic shrill, similar to another issue, but I assume that would go away after more training

Original:
original_3.webm

Generated (0 steps):
generated_3_0.webm

Generated (~1700 steps):
generated_3_1700.webm

Additionally, I stopped the training after that first checkpoint save, and started it again now.
It said it would start at epoch 79, however, it saved the model and optimizer state at iteration 0 to logs\44k\G_0 and D_0, instead of 1770 (or whichever one it initially saved it to)

Letting it run for a bit, it seemed to save at around 891 steps instead of 1770.
Also, after stopping that, too, and restarting it, it went back to 0

ne0escape · 2023-04-09T12:44:49Z

I left it until 100k steps to see if it would gradually fix up..

Big mistake, still sounds robotic!

Hopefully this is fixed now.

34j · 2023-04-09T12:57:34Z

Additionally

😇

34j · 2023-04-09T13:01:28Z

Yup! That indeed helped with 0 steps already (as in, the initial conversion / preparation)
It still seems like there's a bit of metallic shrill, similar to another issue, but I assume that would go away after more training

Original:
original_3.webm

Generated (0 steps):
generated_3_0.webm

Generated (~1700 steps):
generated_3_1700.webm

Additionally, I stopped the training after that first checkpoint save, and started it again now.
It said it would start at epoch 79, however, it saved the model and optimizer state at iteration 0 to logs\44k\G_0 and D_0, instead of 1770 (or whichever one it initially saved it to)

Letting it run for a bit, it seemed to save at around 891 steps instead of 1770.
Also, after stopping that, too, and restarting it, it went back to 0

How many gpus are connected to your pc?

Lordmau5 · 2023-04-09T13:02:57Z

How many gpus are connected to your pc?

A single one, NVIDIA RTX 3090. With about 170 audio clips in the dataset I can get around 1.5 iterations per second (which for my use cases is more than enough)

Also, is your quote with the 😇 positive because I provide more details to the issue? 😁

34j · 2023-04-09T13:13:46Z

In Lightning, global_steps seems to be added every time optimizer.step is called....
Lightning-AI/pytorch-lightning#13752

Lordmau5 · 2023-04-09T13:36:37Z

Could this be helpful in fixing it? Lightning-AI/pytorch-lightning#17281

34j · 2023-04-09T13:44:58Z

It is more complicated than I thought, it is also related to tensorboard

Lordmau5 · 2023-04-09T13:47:55Z

It is more complicated than I thought, it is also related to tensorboard

I see, if you have any more things I should test out please tell me! 😁

ne0escape · 2023-04-09T15:38:28Z

Updating everything, but mine sounds like this:

Original
2200 Steps

I'm using pre-resample, pre-config. and svc pre-hubert -fm crepe (I've also tried just svc pre-hubert).

My training set consists of 263 files that range from 1-14 seconds each, adding up to 20 minutes and 36 seconds of data.

I'm hesitant to train it further because I'm afraid that it might result in the same robotic voice I experienced the last time I trained it for 100k steps. It's possible that I'm mistaken, but I'm convinced that everything was working perfectly just a a week or two ago. I even tried a complete reinstall, but unfortunately, it didn't fix the issue.

Any idea what I'm doing wrong?

34j · 2023-04-09T15:39:59Z

It is more complicated than I thought, it is also related to tensorboard

Might still need additional fix, progress bar displays incomplete

34j · 2023-04-09T15:41:17Z

Updating everything, but mine sounds like this:

Original 2200 Steps

I'm using pre-resample, pre-config. and svc pre-hubert -fm crepe (I've also tried just svc pre-hubert).

My training set consists of 263 files that range from 1-14 seconds each, adding up to 20 minutes and 36 seconds of data.

I'm hesitant to train it further because I'm afraid that it might result in the same robotic voice I experienced the last time I trained it for 100k steps. It's possible that I'm mistaken, but I'm convinced that everything was working perfectly just a a week or two ago. I even tried a complete reinstall, but unfortunately, it didn't fix the issue.

Any idea what I'm doing wrong?

As you can see, I just fixed this issue right now

ne0escape · 2023-04-09T15:53:31Z

Just updated to 3.1.6, but getting this error:

Training: 0it [00:00, ?it/s]Traceback (most recent call last):
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VitsLightning' object has no attribute '_temp_epoch'
Training: 0it [00:00, ?it/s]

Lordmau5 · 2023-04-09T15:59:02Z

Meanwhile my generated one after 1600 steps (I merged the separate commits and didn't include the final one to fix the naming) seems to still be very metallic...

However, it does take the checkpoint into account, so that problem seems to be solved.
generated_1599_v3_1_6.webm

However, this does feel a bit.. strange..
Lightning claims to be ~2x faster for training, but if the results aren't on-par with "1x" of the previous method then there's no real positive of this, correct?

Additionally, I found another part of the code where self.total_batch_idx should be used
https://github.com/34j/so-vits-svc-fork/blob/main/src/so_vits_svc_fork/train.py#L392
^ This change ends up resulting in "-1" instead of the correct step / id. Not sure where this is being

Otherwise it will show the generated audio as 1599, 2399, etc.

First run also doesn't seem to be generating audio logs anymore now, I have to wait until an epoch / log interval is hit

Yep, even after around 4000 steps it still sounds very bad. Pre-lightning didn't sound this bad :(

Went back to 3.0.5 as a test. Training the model from scratch. Already seeing a massive drop in speed (1.4 iterations per second vs. 6 seconds per iteration)

However, first generation at 800 sounds MUCH better:
generated_800_v305.webm
original_800_v305.webm

34j · 2023-04-09T16:52:19Z

anyone can send a pr

Lordmau5 · 2023-04-09T16:54:02Z

I updated my comment just now with a bit more info. 3.0.5 seems to yield better results than the current Lightning implementation.

However, since it is such a big change, I wouldn't even know what changed under the hood or what might need to be tweaked to give better results in Lightning, too... As in, which parameters and so on

ne0escape · 2023-04-09T18:07:29Z

Just updated to 3.1.6, but getting this error:

Training: 0it [00:00, ?it/s]Traceback (most recent call last): raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'VitsLightning' object has no attribute '_temp_epoch' Training: 0it [00:00, ?it/s]

Line 184 in train.py

def __init__(self, reset_optimizer: bool = False, **hparams: Any):
    super().__init__()
    self._temp_epoch = 0  # Add this line to initialize the _temp_epoch attribute

This fixed it for me.

Lordmau5 · 2023-04-09T18:49:32Z

Just updated to 3.1.6, but getting this error:
Training: 0it [00:00, ?it/s]Traceback (most recent call last): raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'VitsLightning' object has no attribute '_temp_epoch' Training: 0it [00:00, ?it/s]

Line 184 in train.py
def __init__(self, reset_optimizer: bool = False, **hparams: Any):
    super().__init__()
    self._temp_epoch = 0  # Add this line to initialize the _temp_epoch attribute
This fixed it for me.

Good addition, doesn't hurt to have

As for the weird metallic noise, I think I've managed to fix it by shifting some function calls around.

I removed one with torch.no_grad(): call in train.py for the loss calculation.
It seems to slightly impact the speed (1.41it/s -> 1.33it/s) but I am getting a much better result.

Additionally, the following is the original code for optimization:

        # optimizer
        self.manual_backward(loss_gen_all)
        optim_g.step()
        optim_g.zero_grad()
        self.untoggle_optimizer(optim_g)

According to the Lightning.AI documentation, it should be as follow:

        # optimizer
        optim_g.zero_grad()
        self.manual_backward(loss_gen_all)
        optim_g.step()
        self.untoggle_optimizer(optim_g)

(Zero grad, manual backward, then step)
https://lightning.ai/docs/pytorch/stable/common/optimization.html#id2
https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#manual-backward

This brings the speed down a bit more (1.33it/s -> 1.27it/s) but I believe it's the correct way to do it.
Unless the order you do it in actually makes more sense for training? (Manual backward, step, then zero grad)

Example of just 200 iterations on the LJ Speech dataset (~500 clips, even though it has WAY more):
original_lj_speech.webm
generated_lj_speech_200.webm

I'll prep a PR in a second

Lordmau5 · 2023-04-09T19:02:23Z

That only leaves the issue of checkpoints saving twice, and I couldn't figure that out yet...

I just had a thought... could it be that, somehow, training is being run twice?
Or at the very least, the validation?

I can't seem to find a 2nd call to log_audio_dict besides in the validation_step, and we always get 2 audio files instead of just 1....

I added a LOG.info to the validation_step method and when booting up the model it does run it twice.
Once it runs it with batch_idx being 0, and once with it being 1

Yeah, finally got to a checkpoint saving point and it happened twice again...

Maybe adding a check for batch_idx == 0 might be an idea, and only then do the actual checkpoint stuff

Yup, that fixed it! Gonna add it into the PR!

Meldoner · 2023-04-11T20:27:49Z

@Lordmau5 so now the latest version better than 3.0.5 or not?

Lordmau5 · 2023-04-11T20:40:18Z

@Lordmau5 so now the latest version better than 3.0.5 or not?

From what I can tell it trains faster thanks to Lightning.AI (the new implementation) and some fixes.
As well as the setting talked about here #288

At least in my testing, that is.

Meldoner · 2023-04-11T20:40:59Z

i mean quality, not speed

Lordmau5 · 2023-04-11T20:46:27Z

i mean quality, not speed

I don't have any models from before 3.1.0 anymore (I redid them all since they were only like 4-5k iterations, so not far in, but they were already pretty good then)

It all depends on the dataset you supply. If the dataset is clean audio, you shouldn't even need a lot (I trained the 2 voices of the Donkey Kong Rap, with 30s and 60s per voice respectively, and since they were pretty clear they give good results)

Lordmau5 added the bug Something isn't working label Apr 9, 2023

34j mentioned this issue Apr 9, 2023

fix(train): fix checkpoint not properly loaded #271

Merged

34j closed this as completed in #271 Apr 9, 2023

Lordmau5 mentioned this issue Apr 9, 2023

fix(train): improve quality of training #274

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newly trained model since Lightning implementation sounds robotic early on #270

Newly trained model since Lightning implementation sounds robotic early on #270

Lordmau5 commented Apr 9, 2023 •

edited

Loading

Lordmau5 commented Apr 9, 2023

tonyco82 commented Apr 9, 2023

34j commented Apr 9, 2023

Lordmau5 commented Apr 9, 2023 •

edited

Loading

ne0escape commented Apr 9, 2023 •

edited

Loading

34j commented Apr 9, 2023

34j commented Apr 9, 2023

Lordmau5 commented Apr 9, 2023 •

edited

Loading

34j commented Apr 9, 2023 •

edited

Loading

Lordmau5 commented Apr 9, 2023

34j commented Apr 9, 2023

Lordmau5 commented Apr 9, 2023

ne0escape commented Apr 9, 2023 •

edited

Loading

34j commented Apr 9, 2023

34j commented Apr 9, 2023 •

edited

Loading

ne0escape commented Apr 9, 2023

Lordmau5 commented Apr 9, 2023 •

edited

Loading

34j commented Apr 9, 2023 •

edited

Loading

Lordmau5 commented Apr 9, 2023

ne0escape commented Apr 9, 2023 •

edited

Loading

Lordmau5 commented Apr 9, 2023

Lordmau5 commented Apr 9, 2023 •

edited

Loading

Meldoner commented Apr 11, 2023 •

edited

Loading

Lordmau5 commented Apr 11, 2023

Meldoner commented Apr 11, 2023

Lordmau5 commented Apr 11, 2023

Newly trained model since Lightning implementation sounds robotic early on #270

Newly trained model since Lightning implementation sounds robotic early on #270

Comments

Lordmau5 commented Apr 9, 2023 • edited Loading

Lordmau5 commented Apr 9, 2023

tonyco82 commented Apr 9, 2023

34j commented Apr 9, 2023

Lordmau5 commented Apr 9, 2023 • edited Loading

ne0escape commented Apr 9, 2023 • edited Loading

34j commented Apr 9, 2023

34j commented Apr 9, 2023

Lordmau5 commented Apr 9, 2023 • edited Loading

34j commented Apr 9, 2023 • edited Loading

Lordmau5 commented Apr 9, 2023

34j commented Apr 9, 2023

Lordmau5 commented Apr 9, 2023

ne0escape commented Apr 9, 2023 • edited Loading

34j commented Apr 9, 2023

34j commented Apr 9, 2023 • edited Loading

ne0escape commented Apr 9, 2023

Lordmau5 commented Apr 9, 2023 • edited Loading

34j commented Apr 9, 2023 • edited Loading

Lordmau5 commented Apr 9, 2023

ne0escape commented Apr 9, 2023 • edited Loading

Lordmau5 commented Apr 9, 2023

Lordmau5 commented Apr 9, 2023 • edited Loading

Meldoner commented Apr 11, 2023 • edited Loading

Lordmau5 commented Apr 11, 2023

Meldoner commented Apr 11, 2023

Lordmau5 commented Apr 11, 2023

Lordmau5 commented Apr 9, 2023 •

edited

Loading

Lordmau5 commented Apr 9, 2023 •

edited

Loading

ne0escape commented Apr 9, 2023 •

edited

Loading

Lordmau5 commented Apr 9, 2023 •

edited

Loading

34j commented Apr 9, 2023 •

edited

Loading

ne0escape commented Apr 9, 2023 •

edited

Loading

34j commented Apr 9, 2023 •

edited

Loading

Lordmau5 commented Apr 9, 2023 •

edited

Loading

34j commented Apr 9, 2023 •

edited

Loading

ne0escape commented Apr 9, 2023 •

edited

Loading

Lordmau5 commented Apr 9, 2023 •

edited

Loading

Meldoner commented Apr 11, 2023 •

edited

Loading