Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should reference model initialize weights uniformly? #11

Closed
ouyangliqi opened this issue Sep 5, 2023 · 3 comments
Closed

Should reference model initialize weights uniformly? #11

ouyangliqi opened this issue Sep 5, 2023 · 3 comments

Comments

@ouyangliqi
Copy link

Thanks for your awesome work! I noticed that the weights of the baseline model for Redpajama are not initialized uniformly. Does this mean that the reference model can be initialized with any settings?

https://github.com/sangmichaelxie/doremi/blob/main/configs/rp_baseline.json

@sangmichaelxie
Copy link
Owner

Yes, the reference model can be trained with any set of domain weights (it's a hyperparameter). A reasonable choice could be to use domain weights that are computed according to the size of the domains (which is done here for RedPajama). If you start from uniform weights, we find that we often need 2 rounds of iterated DoReMi because the reference model trained on uniform weights isn't very good.

@ouyangliqi
Copy link
Author

Thank you so much for your reply.

@sangmichaelxie
Copy link
Owner

sangmichaelxie commented Sep 18, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants