Skip to content

Commit

Permalink
Update ArmoRM blog
Browse files Browse the repository at this point in the history
  • Loading branch information
Haoxiang-Wang committed May 29, 2024
1 parent 4849e42 commit 28aa20c
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ The gating layer is trained on top of the ArmoRM obtained from stage-1. Here we

- **Gating Layer Architecture**: A ReLU MLP with 3 hidden layers of 1024 hidden units
- **Training:** Train the gating layer only, with the rest of the parameters (backbone & regression layer) frozen.
- **Reward Adjustment (for verbosity bias mitigation):** We use the Spearman correlation coefficient as the correlation metric, $\mathrm{Corr}$, and adopt a [binarized UltraFeedback dataset](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned) of 61k examples as the reference data distribution, $\mathcal D$. The penalty coefficients, $\{\lambda_i\}$, are chosen such that $\mathbb{E}_{\mathcal D}[\mathrm{Corr}(r_i', r_{\mathrm{verbose}})] \approx 0$.
- **Reward Adjustment (for verbosity bias mitigation):** We use the Spearman correlation coefficient as the correlation metric, $\mathrm{Corr}$, and adopt a [binarized UltraFeedback dataset](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned) of 61k examples as the reference data distribution, $\mathcal D$. The penalty coefficients, $\{\lambda_i\}$, are chosen such that $\mathbb{E}_ {\mathcal D}[\mathrm{Corr}(r_i', r_{\mathrm{verbose}})] \approx 0$.
- **Datasets**: [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer), [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback), [SHP](https://huggingface.co/datasets/stanfordnlp/SHP?row=0), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), [PKU-SafeRLHF-30K](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-30K), [Argilla-Capybara](https://huggingface.co/datasets/argilla/Capybara-Preferences-Filtered), [Argilla-Math-Preferences](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo), [CodeUltraFeedback](https://huggingface.co/datasets/coseal/CodeUltraFeedback), [PRM-Phase-2](https://github.com/openai/prm800k), [Prometheus2-Preference-Collection](https://huggingface.co/datasets/prometheus-eval/Preference-Collection)
- For datasets that are not binarized into response pairs (e.g., HelpSteer, UltraFeedback, SHP), we take the binarized versions pre-processed in [RLHF Workflow](https://arxiv.org/abs/2405.07863).

Expand Down

0 comments on commit 28aa20c

Please sign in to comment.