Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] 安卓佈署方案 Android Deployment #139

Open
DakeQQ opened this issue Apr 10, 2024 · 2 comments
Open

[Feature Request] 安卓佈署方案 Android Deployment #139

DakeQQ opened this issue Apr 10, 2024 · 2 comments

Comments

@DakeQQ
Copy link

DakeQQ commented Apr 10, 2024

https://github.com/DakeQQ/Native-LLM-for-Android

您好,推荐一个基于ONNXRuntime的安卓LLM布署项目,使用华为P40能跑出5.7 token/s, 8Gen2能跑出10 token/s的成绩(源2.0-2B, q8f32 & 1024滑动窗口上下文). 并且可以期待未来ONNXRuntime更新q4f16后,速度可能再提升50%.

Hello, recommend an Android LLM deployment project based on ONNXRuntime that achieves 5.7 tokens/s on Huawei P40 and 10 tokens/s on 8Gen2 (use Yuan2.0-2B, q8f32 & with a 1024 sliding window context). Additionally, it is anticipated that future updates of ONNXRuntime to q4f16 could potentially increase the speed by another 50%.

@DakeQQ
Copy link
Author

DakeQQ commented Apr 10, 2024

您好, 请问这源2.0-2B有Chat模板吗?
例如:'role: user, content: {user_query} \n role: assistant, content: {assistant_response}'
安卓布署时候不晓得该输入啥prompt模板让它记住上下文, 所以目前只能单轮提问&回答.

@Shawn-IEITSystems
Copy link
Collaborator

@lilianlhl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants