Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support for Seq classification/Reward models #8700

Closed
1 task done
ariaattar opened this issue Sep 21, 2024 · 5 comments · Fixed by #9704
Closed
1 task done

[Feature]: Support for Seq classification/Reward models #8700

ariaattar opened this issue Sep 21, 2024 · 5 comments · Fixed by #9704

Comments

@ariaattar
Copy link

ariaattar commented Sep 21, 2024

🚀 The feature, motivation and pitch

Verifier/reward models are going to be very important moving forward for building:

  • High quality synthetic data pipelines
  • Verifying model reasoning
  • Multi agent systems

Could we add support for sequence classification models like Skywork/Skywork-Reward-Llama-3.1-8B

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@youkaichao
Copy link
Member

contribution is welcome!

vllm already supports embedding model. I think it is quite similar to reward model. I don't know what would be the obstacle to use vllm code to run reward models. We can pretend they are embedding models.

@ariaattar
Copy link
Author

Seems like a lot of the reward models use a different architecture than embedding models.

ValueError: Model architectures ['Gemma2ForSequenceClassification'] are not supported for now.
ValueError: Model architectures ['LlamaForSequenceClassification'] are not supported for now. 

 Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'UltravoxModel', 'BartModel', 'BartForConditionalGeneration']

Here are two examples:
Skywork/Skywork-Reward-Gemma-2-27B
Ray2333/GRM-Llama3-8B-rewardmodel-ft

@ariaattar
Copy link
Author

ariaattar commented Sep 23, 2024

@youkaichao Added #8740 tried to convert it to the vllm format, but running into some tensor shape issues in compute logits. Let me know if the conversion generally looks right.

@youkaichao
Copy link
Member

looks like #8896 already implements it.

@natolambert
Copy link

Hey - I've been working with reward models substantially in the open ecosystem building rewardbench, in reality most of the open models have subtly different architectures.

The easiest is *ForSequenceClassification, but it is much messier from here. Happy to take questions -- I'm going to look at the reward model RFC now too. Curious to follow these implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants