You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you very much for open-sourcing such an excellent work as ARMO. I am currently reproducing the code for stage2-train. Based on the data you provided, I only made two modifications. First, I replaced the preference data with Skywork/Skywork-Reward-Preference-80K-v0.2, and second, I replaced the reference data with Skywork/Skywork-Reward-Preference-80K-v0.2 as well. The final training results are shown below, and the results remain the same even if I adjust the training steps or learning rate, and there is a significant performance gap compared to the model you provided. Do you know what might be causing this?
Also, I obtained the .pt file by training according to your code. Could you please provide a merged version of the code so that the model I train can maintain the same structure as the RLHFlow/ArmoRM-Llama3-8B-v0.1 you provided? Thank you very much!
Thank you very much for open-sourcing such an excellent work as ARMO. I am currently reproducing the code for stage2-train. Based on the data you provided, I only made two modifications. First, I replaced the preference data with Skywork/Skywork-Reward-Preference-80K-v0.2, and second, I replaced the reference data with Skywork/Skywork-Reward-Preference-80K-v0.2 as well. The final training results are shown below, and the results remain the same even if I adjust the training steps or learning rate, and there is a significant performance gap compared to the model you provided. Do you know what might be causing this?
Also, I obtained the
.pt
file by training according to your code. Could you please provide a merged version of the code so that the model I train can maintain the same structure as the RLHFlow/ArmoRM-Llama3-8B-v0.1 you provided? Thank you very much!The text was updated successfully, but these errors were encountered: