DDI training compared to not DDI training #73

cantabile-kwok · 2022-11-12T12:31:47Z

Hi! I am curious about why you use DDI (data-dependent initialization) here, as not doing DDI won't cause a bug in the program. So how is the performance of not using DDI at the beginning? Does it have a specific use?

cantabile-kwok · 2022-11-12T12:38:29Z

Also, where is the source of this method? I found a paper (https://arxiv.org/pdf/1511.06856.pdf) but it does not seem to be the implementation used in this repo. Appreciate any discussions!

MordehayM · 2024-12-04T21:12:25Z

Hi @cantabile-kwok
Do you have some insights about it?
Do you know if the DDI is necessary?

cantabile-kwok · 2024-12-05T03:05:51Z

Hi @cantabile-kwok Do you have some insights about it? Do you know if the DDI is necessary?

Hi, I did not investigate this issue into much detail afterwards, and I started training with DDI every time since then, to prevent potential degradation of performance. I think using DDI is never a bad thing because it lets your model initialized with more reasonable parameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDI training compared to not DDI training #73

DDI training compared to not DDI training #73

cantabile-kwok commented Nov 12, 2022

cantabile-kwok commented Nov 12, 2022 •

edited

Loading

MordehayM commented Dec 4, 2024 •

edited

Loading

cantabile-kwok commented Dec 5, 2024

DDI training compared to not DDI training #73

DDI training compared to not DDI training #73

Comments

cantabile-kwok commented Nov 12, 2022

cantabile-kwok commented Nov 12, 2022 • edited Loading

MordehayM commented Dec 4, 2024 • edited Loading

cantabile-kwok commented Dec 5, 2024

cantabile-kwok commented Nov 12, 2022 •

edited

Loading

MordehayM commented Dec 4, 2024 •

edited

Loading