You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, congratulations to the great work and thanks for open source!
I am running step 3.2 with pair-preference-model-LLaMA3-8B. However, I encountered the warning "Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at RLHFlow/pair-preference-model-LLaMA3-8B and are newly initialized: ['score.weight']". Could you please help me with the issue? Thanks a lot!
The text was updated successfully, but these errors were encountered:
The current code is for the Bradley Terry reward, which is a ``AutoModelForSequenceClassification''.
In contrast, the pair-preference model is ``AutoModelForCausalLM''. Also the way of using these two models is different. I should write another script for the pair-RM in the next few days.
@WeiXiongUST Hello, is there any recent progress on this? I'm curious about if pair-rm needs $C_k^2$ inferences for k candidates. How can we get the absolute reward score for each candidate?
Hi, congratulations to the great work and thanks for open source!
I am running step 3.2 with pair-preference-model-LLaMA3-8B. However, I encountered the warning "Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at RLHFlow/pair-preference-model-LLaMA3-8B and are newly initialized: ['score.weight']". Could you please help me with the issue? Thanks a lot!
The text was updated successfully, but these errors were encountered: