restore epoch and step information when resuming training #5

thewh1teagle · 2024-09-11T21:12:27Z

When resuming training from checkpoint it starts from 0 epoch and 0 step although it should be much higher and I can hear that it's already trained.
Can you fix it so it will restore it so I can keep track on the step counter when resuming?

Lightning-AI/pytorch-lightning#12274

rmcpantoja · 2024-09-11T21:30:32Z

hi,
Is this error occurring in --restore-from-checkpoint argument from the trainer? That didn't happen when I used it, although I think it is used more for finetuning purposes. If not, can you try that argument?
Cheers.

thewh1teagle · 2024-09-11T21:33:02Z

occurring in --restore-from-checkpoint argument from the trainer?

Hey,
I don't see this argument in the trainer, maybe your'e talking about old version?

https://github.com/mush42/optispeech/blob/main/configs/train.yaml

mush42 · 2024-09-11T22:00:59Z

@thewh1teagle
are you using the forced_resume argument?
I added this to be able to (re)load the model when I make changes to it's architecture while benefiting from already trained layers.
It is very niche use case, but I added it to save myself some time during development.

If you want to resume training normally, then the ckpt_path is all you need.

thewh1teagle mentioned this issue Sep 11, 2024

Training new language #2

Closed

mush42 closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restore epoch and step information when resuming training #5

restore epoch and step information when resuming training #5

thewh1teagle commented Sep 11, 2024

rmcpantoja commented Sep 11, 2024 •

edited

Loading

thewh1teagle commented Sep 11, 2024

mush42 commented Sep 11, 2024

restore epoch and step information when resuming training #5

restore epoch and step information when resuming training #5

Comments

thewh1teagle commented Sep 11, 2024

rmcpantoja commented Sep 11, 2024 • edited Loading

thewh1teagle commented Sep 11, 2024

mush42 commented Sep 11, 2024

rmcpantoja commented Sep 11, 2024 •

edited

Loading