Skip to content

[perf] improve next token latency when (#threads >= 2 * #heads) by sharding the head into multiple splits #111

[perf] improve next token latency when (#threads >= 2 * #heads) by sharding the head into multiple splits

[perf] improve next token latency when (#threads >= 2 * #heads) by sharding the head into multiple splits #111

Triggered via pull request November 22, 2023 07:03
Status Success
Total duration 20m 36s
Artifacts

xft_PR.yml

on: pull_request
build_and_simple_test
20m 25s
build_and_simple_test
Fit to window
Zoom out
Zoom in