You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I want to use speculative sampling in the vllm, in the generation step, the number of input tokens of each sequence is larger than one, and then error "an illegal memory access was encountered" is reported.
Can you guys suggest a way to support speculative sampling with vllm?
Thanks a lot
The text was updated successfully, but these errors were encountered:
lsy643
changed the title
Can the Op single_query_cached_kv_attention in PageAttention Support Multiple token in one sequence?
How to do speculative sampling with vllm?
Sep 14, 2023
When I want to use speculative sampling in the vllm, in the generation step, the number of input tokens of each sequence is larger than one, and then error "an illegal memory access was encountered" is reported.
Can you guys suggest a way to support speculative sampling with vllm?
Thanks a lot
The text was updated successfully, but these errors were encountered: