-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add speculative decoding params to lm_bench #1221
Conversation
7bfee69
to
7ac52d7
Compare
@sbalandi could you please include test into GHA for speculative decognig case |
@sbalandi looks like the selected for test models are too large. Maybe we can use something less compute expensive? e.g. tinyllama with fp16 and int4/int8 precision as draft. Also you can use pre-converted models from here https://huggingface.co/collections/OpenVINO/llm-6687aaa2abca3bbcec71a9bd changing optimum-cli to huggingface-cli download command |
8caa747
to
795563d
Compare
35050d3
to
e1444e2
Compare
e1444e2
to
566a710
Compare
87eb848
to
e4155c3
Compare
a2e1ae9
Task: [CVS-155520](https://jira.devtools.intel.com/browse/CVS-155520) --------- Co-authored-by: Ekaterina Aidova <[email protected]>
Task: CVS-155520