Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding ARMO stage2-train code #37

Open
RayWang-iat opened this issue Oct 15, 2024 · 0 comments
Open

Question regarding ARMO stage2-train code #37

RayWang-iat opened this issue Oct 15, 2024 · 0 comments

Comments

@RayWang-iat
Copy link

Thank you very much for open-sourcing such an excellent work as ARMO. I am currently reproducing the code for stage2-train. Based on the data you provided, I only made two modifications. First, I replaced the preference data with Skywork/Skywork-Reward-Preference-80K-v0.2, and second, I replaced the reference data with Skywork/Skywork-Reward-Preference-80K-v0.2 as well. The final training results are shown below, and the results remain the same even if I adjust the training steps or learning rate, and there is a significant performance gap compared to the model you provided. Do you know what might be causing this?

Also, I obtained the .pt file by training according to your code. Could you please provide a merged version of the code so that the model I train can maintain the same structure as the RLHFlow/ArmoRM-Llama3-8B-v0.1 you provided? Thank you very much!

Evaluating model...
Validation accuracy: 0.8965
Saved gating network to xxx/gating_network_FsfairX-LLaMA3-RM-v0.1_6k1.pt 

Evaluating on RewardBench...

  df_acc = pd.concat([df_acc, pd.DataFrame(row)], ignore_index=True)
RewardBench Scores:
        Chat  Chat Hard     Safety  Reasoning  Score
0  99.162012  64.692981  89.099712  88.235938   85.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant