[Continious Batching] Speculative decoding based on paged attention #2005
causal_lm_cpp.yml
on: pull_request
Matrix: cpp-beam_search_causal_lm-ubuntu
cpp-multinomial-greedy_causal_lm-ubuntu
9m 51s
cpp-greedy_causal_lm-windows
0s
cpp-beam_search_causal_lm-Qwen-7B-Chat
10m 37s
cpp-beam_search_causal_lm-Qwen1_5-7B-Chat
9m 41s
cpp-beam_search_causal_lm-Phi-2
8m 3s
cpp-beam_search_causal_lm-notus-7b-v1
7m 34s
cpp-speculative_decoding_lm-ubuntu
9m 28s
cpp-prompt_lookup_decoding_lm-ubuntu
11m 32s
cpp-Phi-1_5
6m 21s
cpp-greedy_causal_lm-redpajama-3b-chat
9m 11s
cpp-chat_sample-ubuntu
11m 9s
cpp-continuous-batching-ubuntu
7m 14s
cpp-continuous-batching-windows
0s
cpp-continuous-batching-macos
2m 16s
ci/gha_overall_status_causal_lm
0s
Annotations
9 errors and 14 warnings