[Feature]: Support for Seq classification/Reward models #8700

ariaattar · 2024-09-21T22:31:45Z

🚀 The feature, motivation and pitch

Verifier/reward models are going to be very important moving forward for building:

High quality synthetic data pipelines
Verifying model reasoning
Multi agent systems

Could we add support for sequence classification models like Skywork/Skywork-Reward-Llama-3.1-8B

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

youkaichao · 2024-09-22T17:31:26Z

contribution is welcome!

vllm already supports embedding model. I think it is quite similar to reward model. I don't know what would be the obstacle to use vllm code to run reward models. We can pretend they are embedding models.

ariaattar · 2024-09-22T20:26:59Z

Seems like a lot of the reward models use a different architecture than embedding models.

ValueError: Model architectures ['Gemma2ForSequenceClassification'] are not supported for now.
ValueError: Model architectures ['LlamaForSequenceClassification'] are not supported for now. 

 Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'UltravoxModel', 'BartModel', 'BartForConditionalGeneration']

Here are two examples:
Skywork/Skywork-Reward-Gemma-2-27B
Ray2333/GRM-Llama3-8B-rewardmodel-ft

ariaattar · 2024-09-23T16:45:32Z

@youkaichao Added #8740 tried to convert it to the vllm format, but running into some tensor shape issues in compute logits. Let me know if the conversion generally looks right.

youkaichao · 2024-09-30T18:32:50Z

looks like #8896 already implements it.

natolambert · 2024-10-01T01:21:01Z

Hey - I've been working with reward models substantially in the open ecosystem building rewardbench, in reality most of the open models have subtly different architectures.

The easiest is *ForSequenceClassification, but it is much messier from here. Happy to take questions -- I'm going to look at the reward model RFC now too. Curious to follow these implementations.

ariaattar added the feature request label Sep 21, 2024

kakao-kevin-us mentioned this issue Oct 26, 2024

[Model] Add classification Task with Qwen2ForSequenceClassification #9704

Merged

DarkLight1337 closed this as completed in #9704 Oct 26, 2024

hrdxwandg mentioned this issue Nov 19, 2024

[Bug]: request reward model report 500 Internal Server Error #10444

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support for Seq classification/Reward models #8700

[Feature]: Support for Seq classification/Reward models #8700

ariaattar commented Sep 21, 2024 •

edited

Loading

youkaichao commented Sep 22, 2024

ariaattar commented Sep 22, 2024

ariaattar commented Sep 23, 2024 •

edited

Loading

youkaichao commented Sep 30, 2024

natolambert commented Oct 1, 2024

[Feature]: Support for Seq classification/Reward models #8700

[Feature]: Support for Seq classification/Reward models #8700

Comments

ariaattar commented Sep 21, 2024 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

youkaichao commented Sep 22, 2024

ariaattar commented Sep 22, 2024

ariaattar commented Sep 23, 2024 • edited Loading

youkaichao commented Sep 30, 2024

natolambert commented Oct 1, 2024

ariaattar commented Sep 21, 2024 •

edited

Loading

ariaattar commented Sep 23, 2024 •

edited

Loading