How do you get the model to be good at code if it downsamples code? #13

teknium1 · 2023-09-14T21:15:43Z

Question in topic..

sangmichaelxie · 2023-09-25T22:08:32Z

The algorithm tends to downsample code since it tends to have lower log perplexity overall (many tokens are predictable due to syntax, etc) so the excess losses may also be smaller, and also Github data can be pretty varying in quality. Code ability can be learned from many sources, including the web and stackexchange. However, if you have a prior on the high importance of some code domain, you can set the reference weight for code to be higher. This should decrease the reference model's perplexity on code examples, which would increase the excess loss on code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you get the model to be good at code if it downsamples code? #13

How do you get the model to be good at code if it downsamples code? #13

teknium1 commented Sep 14, 2023

sangmichaelxie commented Sep 25, 2023

How do you get the model to be good at code if it downsamples code? #13

How do you get the model to be good at code if it downsamples code? #13

Comments

teknium1 commented Sep 14, 2023

sangmichaelxie commented Sep 25, 2023