Re-initializing main() because the training of light MLP diverged and all the values are zero. #4

YONGHUICAI · 2023-11-29T12:27:44Z

Thanks for open source such a great project！
When I trained the yufeng and marcel data sets, errors quickly occurred: Re-initializing main() because the training of light MLP diverged and all the values are zero.
Code tested on RTX 3090Ti
How can I solve this problem?

sbharadwajj · 2023-11-29T13:23:08Z

Hi

Did the code restart or crash?
typically it should restart.

YONGHUICAI · 2023-11-30T02:21:02Z

Hi

Did the code restart or crash? typically it should restart.

It keeps restarting and then keeps repeating this error.
I want to know what causes this error and how I can solve it.
Thanks.

sbharadwajj · 2023-11-30T09:11:25Z

Hi,

The only solution unfortunately is to restart the code. I did not find time to check if it is a GPU problem because in one of the GPU I used, it did not occur and ran into the problem only on other GPUs.

So please try restarting the code. If it still does not work, let me know!

The error is caused because we do not use an activation function for the output layer of the light MLP to keep the values unconstrained as we have tonemap the values.

yydxlv · 2023-12-01T10:00:07Z

The same problem is occured on A800

sbharadwajj · 2023-12-01T10:57:44Z

Hi,

does the code never work or does it happen only a few times?

YONGHUICAI · 2023-12-04T05:52:12Z

Hi,

does the code never work or does it happen only a few times?

Unfortunately, the code never work

sbharadwajj · 2023-12-04T15:29:48Z

Hi,
Can you please run the code once again and copy-paste the exact log that is printed on the terminal?

Orange-Ctrl · 2023-12-06T08:57:06Z

Hi,
I run the code on rtx3090 and got the same problem.

Orange-Ctrl · 2023-12-06T09:43:34Z

I find the problem, it's because lack of the module robust_loss_pytorch. It runs successfully now!

YONGHUICAI · 2023-12-06T10:23:18Z

Hi, I run the code on rtx3090 and got the same problem.

yeah, pip install git+https://github.com/jonbarron/robust_loss_pytorch
it works!

sbharadwajj · 2023-12-06T13:01:26Z

Glad to hear it works now :)

I think I forgot to include it in the requirements.txt.

@Orange-Ctrl i can also see a warning of tinycudann installation in your log image. You have compiled tinycudann on a different GPU device and running it on another one. To get the best performance, please make sure it is properly installed.

zydmu123 · 2024-01-16T18:30:45Z

Sorry to bother you @sbharadwajj But it really doesn't work even with the installation of above "robust_loss_pytorch", my GPU is RTX3090, never run successfully for once! Could you give me some help, really thanks!

Yingyan-Xu · 2024-07-12T12:26:16Z

Hi @zydmu123 did you manage to run the code in the end? I'm having the same issue on RTX3090 and I also tried robust_loss_pytorch and it didn't work.

sbharadwajj · 2024-07-18T08:10:13Z

@Yingyan-Xu did you verify if the mask if correct? Can you quickly save the mask and check?

Orange-Ctrl mentioned this issue Jan 17, 2024

the result looks not that good #5

Closed

sbharadwajj closed this as completed Jul 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-initializing main() because the training of light MLP diverged and all the values are zero. #4

Re-initializing main() because the training of light MLP diverged and all the values are zero. #4

YONGHUICAI commented Nov 29, 2023

sbharadwajj commented Nov 29, 2023

YONGHUICAI commented Nov 30, 2023

sbharadwajj commented Nov 30, 2023 •

edited

Loading

yydxlv commented Dec 1, 2023

sbharadwajj commented Dec 1, 2023

YONGHUICAI commented Dec 4, 2023

sbharadwajj commented Dec 4, 2023 •

edited

Loading

Orange-Ctrl commented Dec 6, 2023 •

edited

Loading

Orange-Ctrl commented Dec 6, 2023

YONGHUICAI commented Dec 6, 2023

sbharadwajj commented Dec 6, 2023

zydmu123 commented Jan 16, 2024

Yingyan-Xu commented Jul 12, 2024

sbharadwajj commented Jul 18, 2024

Re-initializing main() because the training of light MLP diverged and all the values ​​are zero. #4

Re-initializing main() because the training of light MLP diverged and all the values ​​are zero. #4

Comments

YONGHUICAI commented Nov 29, 2023

sbharadwajj commented Nov 29, 2023

YONGHUICAI commented Nov 30, 2023

sbharadwajj commented Nov 30, 2023 • edited Loading

yydxlv commented Dec 1, 2023

sbharadwajj commented Dec 1, 2023

YONGHUICAI commented Dec 4, 2023

sbharadwajj commented Dec 4, 2023 • edited Loading

Orange-Ctrl commented Dec 6, 2023 • edited Loading

Orange-Ctrl commented Dec 6, 2023

YONGHUICAI commented Dec 6, 2023

sbharadwajj commented Dec 6, 2023

zydmu123 commented Jan 16, 2024

Yingyan-Xu commented Jul 12, 2024

sbharadwajj commented Jul 18, 2024

Re-initializing main() because the training of light MLP diverged and all the values are zero. #4

Re-initializing main() because the training of light MLP diverged and all the values are zero. #4

sbharadwajj commented Nov 30, 2023 •

edited

Loading

sbharadwajj commented Dec 4, 2023 •

edited

Loading

Orange-Ctrl commented Dec 6, 2023 •

edited

Loading