You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The algorithm tends to downsample code since it tends to have lower log perplexity overall (many tokens are predictable due to syntax, etc) so the excess losses may also be smaller, and also Github data can be pretty varying in quality. Code ability can be learned from many sources, including the web and stackexchange. However, if you have a prior on the high importance of some code domain, you can set the reference weight for code to be higher. This should decrease the reference model's perplexity on code examples, which would increase the excess loss on code.
Question in topic..
The text was updated successfully, but these errors were encountered: