You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I am curious about why you use DDI (data-dependent initialization) here, as not doing DDI won't cause a bug in the program. So how is the performance of not using DDI at the beginning? Does it have a specific use?
The text was updated successfully, but these errors were encountered:
Also, where is the source of this method? I found a paper (https://arxiv.org/pdf/1511.06856.pdf) but it does not seem to be the implementation used in this repo. Appreciate any discussions!
Hi @cantabile-kwok Do you have some insights about it? Do you know if the DDI is necessary?
Hi, I did not investigate this issue into much detail afterwards, and I started training with DDI every time since then, to prevent potential degradation of performance. I think using DDI is never a bad thing because it lets your model initialized with more reasonable parameters.
Hi! I am curious about why you use DDI (data-dependent initialization) here, as not doing DDI won't cause a bug in the program. So how is the performance of not using DDI at the beginning? Does it have a specific use?
The text was updated successfully, but these errors were encountered: