demo on reward models #118

ifiaposto · 2024-11-29T12:48:41Z

This is more of a suggestion than an issue. I think it would extend the scope of the repo a lot if an example applying posteriors to improve reward model (Bradley Terry ) robustness was added (for example on Llama-3-8B-Instruct and hh-rlhf datasets).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo on reward models #118

demo on reward models #118

ifiaposto commented Nov 29, 2024

demo on reward models #118

demo on reward models #118

Comments

ifiaposto commented Nov 29, 2024