How many rounds do we need to converge domain weights on The Pile? #15

ouyangliqi · 2023-09-27T11:15:11Z

Thanks for your awesome work! I noticed that there is a optimized weights called configs/pile_doremi_r1_120M_ref:pile_baseline_50kvocab_nopack_120M.json as shown in README. Can we consider this domain weights as the result of the first round of doremi?

By comparing with the results shown in the paper, we can find that these optimized weights are far from the one reported in the paper. For example, the domain weight of Pile-CC is 0.13788709, but the result in the paper is 0.6057. And if 0.13788709 is the result of the first round, we can conclude that the increase domain weight in Pile-CC is about 0.028861896. Then we can estimate that it would take approximately 21 rounds to converge to 0.6057.

P.S. Thanks for your reply in this issue: #11. I also want to ask how many rounds do we need to converge the domain weights on RedPajama?

Thanks.

The text was updated successfully, but these errors were encountered:

sangmichaelxie · 2023-10-16T16:06:28Z

Yes, you can consider it to be the results of the first round, for a 50k vocab size (GPT-NeoX tokenizer) and a 120M proxy model. The script for running it is in scripts/run_pile.sh. The results in the paper are for a 256k vocab size (a Google internal tokenizer) and a 280M proxy model, and the dynamics turn out to be different, but that is also the result of 1 round of DoReMi. The 50k/120M results are more similar to the 1B proxy model results in the paper.

We only tried 2 rounds starting from uniform domain weights on RedPajama. This paper (https://arxiv.org/abs/2310.06694) uses a variant of DoReMi on RedPajama as well, with a similar resulting data balance (where C4 becomes highest).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How many rounds do we need to converge domain weights on The Pile? #15

How many rounds do we need to converge domain weights on The Pile? #15

ouyangliqi commented Sep 27, 2023

sangmichaelxie commented Oct 16, 2023

How many rounds do we need to converge domain weights on The Pile? #15

How many rounds do we need to converge domain weights on The Pile? #15

Comments

ouyangliqi commented Sep 27, 2023

sangmichaelxie commented Oct 16, 2023