[Feature Request] 安卓佈署方案 Android Deployment #139

DakeQQ · 2024-04-10T05:02:02Z

https://github.com/DakeQQ/Native-LLM-for-Android

您好，推荐一个基于ONNXRuntime的安卓LLM布署项目，使用华为P40能跑出5.7 token/s, 8Gen2能跑出10 token/s的成绩（源2.0-2B, q8f32 & 1024滑动窗口上下文). 并且可以期待未来ONNXRuntime更新q4f16后，速度可能再提升50%.

Hello, recommend an Android LLM deployment project based on ONNXRuntime that achieves 5.7 tokens/s on Huawei P40 and 10 tokens/s on 8Gen2 (use Yuan2.0-2B, q8f32 & with a 1024 sliding window context). Additionally, it is anticipated that future updates of ONNXRuntime to q4f16 could potentially increase the speed by another 50%.

DakeQQ · 2024-04-10T05:16:03Z

您好, 请问这源2.0-2B有Chat模板吗？
例如：'role: user, content: {user_query} \n role: assistant, content: {assistant_response}'
安卓布署时候不晓得该输入啥prompt模板让它记住上下文, 所以目前只能单轮提问&回答.

Shawn-IEITSystems · 2024-04-23T10:06:10Z

@lilianlhl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] 安卓佈署方案 Android Deployment #139

[Feature Request] 安卓佈署方案 Android Deployment #139

DakeQQ commented Apr 10, 2024

DakeQQ commented Apr 10, 2024

Shawn-IEITSystems commented Apr 23, 2024

[Feature Request] 安卓佈署方案 Android Deployment #139

[Feature Request] 安卓佈署方案 Android Deployment #139

Comments

DakeQQ commented Apr 10, 2024

DakeQQ commented Apr 10, 2024

Shawn-IEITSystems commented Apr 23, 2024