You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The speculative decoding framework allows the target model to have LoRAs, however the work to set up batch expansion has not yet been done. We can implement batch expansion for LoRA and allow speculative decoding for LoRA.
I expect this to work well for larger models (e.g. 70B) but more difficult with smaller models due to latency constraints and vLLM overheads. Perhaps with a speculator like ngram / eagle / mlpspeculator it can work for 7b models as well.
Note this work does not include applying LoRA to the speculator; that can be a future work.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
I took a first pass, admittedly there's a lot of knowledge I'm not so familiar with but I would really like this feature so I'll invest some time into it and see if I can make some progress. If anyone else is interested, happy to collaborate.
🚀 The feature, motivation and pitch
The speculative decoding framework allows the target model to have LoRAs, however the work to set up batch expansion has not yet been done. We can implement batch expansion for LoRA and allow speculative decoding for LoRA.
The work required is basically to implement batch expansion but pass through the LoRA arguments. See "Let’s talk about code" in the following notes: https://docs.google.com/document/d/1z4Tgb1FcDr3YXvFPelyn-T-DEnLqqrlrxRi3TvIyAmg/edit
I expect this to work well for larger models (e.g. 70B) but more difficult with smaller models due to latency constraints and vLLM overheads. Perhaps with a speculator like ngram / eagle / mlpspeculator it can work for 7b models as well.
Note this work does not include applying LoRA to the speculator; that can be a future work.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: