-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should reference model initialize weights uniformly? #11
Comments
Yes, the reference model can be trained with any set of domain weights (it's a hyperparameter). A reasonable choice could be to use domain weights that are computed according to the size of the domains (which is done here for RedPajama). If you start from uniform weights, we find that we often need 2 rounds of iterated DoReMi because the reference model trained on uniform weights isn't very good. |
Thank you so much for your reply. |
Thank you for your prompt reply! I have one more question. I'm using my
own dataset, which is also RedPajama but organized differently. Even after
the third round of iterated DoReMi, I've noticed that the domain weights
still haven't converged. What might affect the number of rounds I need?
In my experience with running RedPajama (albeit with an older version of
the code), the weights for common crawl should typically go down while c4
goes up (even if common crawl is very high in the beginning). If you do
iterated DoReMi from uniform weights (for 2 rounds) I've seen that c4
becomes the dominant domain. You might find that you need more rounds
particularly if the initial reference domain weights are far from the
converged value.
…On Wed, Sep 6, 2023 at 1:01 AM ChloeAuYeung ***@***.***> wrote:
Thank you for your prompt reply! I have one more question. I'm using my
own dataset, which is also RedPajama but organized differently. Even after
the third round of iterated DoReMi, I've noticed that the domain weights
still haven't converged. What might affect the number of rounds I need?
P.S. I'm using the same training settings as in run_rp_doremi280M.sh,
except I had to lower the per_device_train_batch_size to 20 due to
hardware limitations. Following is my result of each round.
Round 1 Round 2 Round 3
common_crawl 0.661 0.580 0.592
c4 0.187 0.162 0.186
github 0.024 0.052 0.040
wikipedia 0.062 0.088 0.086
book 0.037 0.036 0.042
arxiv 0.013 0.040 0.025
stackexchange 0.017 0.043 0.028
—
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4J2SVBYSCSODNDMSXUN3DXZAUXLANCNFSM6AAAAAA4LJMSBY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks for your awesome work! I noticed that the weights of the baseline model for Redpajama are not initialized uniformly. Does this mean that the reference model can be initialized with any settings?
https://github.com/sangmichaelxie/doremi/blob/main/configs/rp_baseline.json
The text was updated successfully, but these errors were encountered: