Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDI training compared to not DDI training #73

Open
cantabile-kwok opened this issue Nov 12, 2022 · 3 comments
Open

DDI training compared to not DDI training #73

cantabile-kwok opened this issue Nov 12, 2022 · 3 comments

Comments

@cantabile-kwok
Copy link

Hi! I am curious about why you use DDI (data-dependent initialization) here, as not doing DDI won't cause a bug in the program. So how is the performance of not using DDI at the beginning? Does it have a specific use?

@cantabile-kwok
Copy link
Author

cantabile-kwok commented Nov 12, 2022

Also, where is the source of this method? I found a paper (https://arxiv.org/pdf/1511.06856.pdf) but it does not seem to be the implementation used in this repo. Appreciate any discussions!

@MordehayM
Copy link

MordehayM commented Dec 4, 2024

Hi @cantabile-kwok
Do you have some insights about it?
Do you know if the DDI is necessary?

@cantabile-kwok
Copy link
Author

Hi @cantabile-kwok Do you have some insights about it? Do you know if the DDI is necessary?

Hi, I did not investigate this issue into much detail afterwards, and I started training with DDI every time since then, to prevent potential degradation of performance. I think using DDI is never a bad thing because it lets your model initialized with more reasonable parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants