You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, recommend an Android LLM deployment project based on ONNXRuntime that achieves 5.7 tokens/s on Huawei P40 and 10 tokens/s on 8Gen2 (use Yuan2.0-2B, q8f32 & with a 1024 sliding window context). Additionally, it is anticipated that future updates of ONNXRuntime to q4f16 could potentially increase the speed by another 50%.
The text was updated successfully, but these errors were encountered:
https://github.com/DakeQQ/Native-LLM-for-Android
您好,推荐一个基于ONNXRuntime的安卓LLM布署项目,使用华为P40能跑出5.7 token/s, 8Gen2能跑出10 token/s的成绩(源2.0-2B, q8f32 & 1024滑动窗口上下文). 并且可以期待未来ONNXRuntime更新q4f16后,速度可能再提升50%.
Hello, recommend an Android LLM deployment project based on ONNXRuntime that achieves 5.7 tokens/s on Huawei P40 and 10 tokens/s on 8Gen2 (use Yuan2.0-2B, q8f32 & with a 1024 sliding window context). Additionally, it is anticipated that future updates of ONNXRuntime to q4f16 could potentially increase the speed by another 50%.
The text was updated successfully, but these errors were encountered: