Skip to content

Commit

Permalink
release v1.0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
will.yang committed May 9, 2024
1 parent b81deb2 commit d59d017
Show file tree
Hide file tree
Showing 27 changed files with 1,141 additions and 77 deletions.
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# CHANGELOG
## v1.0.1
- Optimize model conversion memory occupation
- Optimize inference memory occupation
- Increase prefill speed
- Reduce initialization time
- Improve quantization accuracy
- Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
- Add Server invocation
- Add inference interruption interface
- Add logprob and token_id to the return value

## v1.0.0
- Supports the conversion and deployment of LLM models on RK3588/RK3576 platforms
- Compatible with Hugging Face model architectures
- Currently supports the models Llama, Qwen, Qwen2, and Phi-2
- Supports quantization with w8a8 and w4a16 precision
64 changes: 64 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
Copyright (c) Rockchip Electronics Co., Ltd.
All rights reserved.

// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice,
// this list of conditions and the following disclaimer in the documentation
// and/or other materials provided with the distribution.
//
// 3. Neither the name of the copyright holder nor the names of its contributors
// may be used to endorse or promote products derived from this software without
// specific prior written permission.
//
// 4. This Software may contain some Open Source Software. You may not redistribute
// and/or modify such Open Source Software except in compliance with the applicable
// Open Source License.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
// POSSIBILITY OF SUCH DAMAGE.

The following Open Source Software have been modified by Rockchip Electronics Co., Ltd.
----------------------------------------------------------------------------------------
1. ggml master
Copyright (c) 2023-2024 The ggml authors
All rights reserved.
Licensed under the terms of the MIT License

2. llama.cpp master
Copyright (c) 2023-2024 The ggml authors
All rights reserved.
Licensed under the terms of the MIT License

The terms of the MIT License:
--------------------------------------------------------------------
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
36 changes: 24 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,23 +17,35 @@
- RK3588 Series
- RK3576 Series

# Support Models
- [X] [TinyLLAMA 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/tree/fe8a4ea1ffedaf415f4da2f062534de366a451e6)
- [X] [Qwen 1.8B](https://huggingface.co/Qwen/Qwen-1_8B-Chat/tree/1d0f68de57b88cfde81f3c3e537f24464d889081)
- [X] [Qwen2 0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B/tree/8f445e3628f3500ee69f24e1303c9f10f5342a39)
- [X] [Phi-2 2.7B](https://hf-mirror.com/microsoft/phi-2/tree/834565c23f9b28b96ccbeabe614dd906b6db551a)
- [X] [Phi-3 3.8B](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/tree/291e9e30e38030c23497afa30f3af1f104837aa6)
- [X] [ChatGLM3 6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405)
- [X] [Gemma 2B](https://huggingface.co/google/gemma-2b-it/tree/de144fb2268dee1066f515465df532c05e699d48)
- [X] [InternLM2 1.8B](https://huggingface.co/internlm/internlm2-chat-1_8b/tree/ecccbb5c87079ad84e5788baa55dd6e21a9c614d)
- [X] [MiniCPM 2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16/tree/79fbb1db171e6d8bf77cdb0a94076a43003abd9e)

# Download
- You can also download all packages, docker image, examples, docs and platform-tools from [RKLLM_SDK](https://console.zbox.filez.com/l/RJJDmB), fetch code: rkllm

# RKNN Toolkit2
If you want to deploy additional AI model, we have introduced a new SDK called RKNN-Toolkit2. For details, please refer to:
If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:

https://github.com/airockchip/rknn-toolkit2

# Notes

Due to recent updates to the Phi2 model, the current version of the RKLLM SDK does not yet support these changes.
Please ensure to download a version of the [Phi2](https://hf-mirror.com/microsoft/phi-2/tree/834565c23f9b28b96ccbeabe614dd906b6db551a) model that is supported.

# CHANGELOG

## v1.0.0-beta
- Supports the conversion and deployment of LLM models on RK3588/RK3576 platforms
- Compatible with Hugging Face model architectures
- Currently supports the models LLaMA, Qwen, Qwen2, and Phi-2
- Supports quantization with w8a8 and w4a16 precision
## v1.0.1
- Optimize model conversion memory occupation
- Optimize inference memory occupation
- Increase prefill speed
- Reduce initialization time
- Improve quantization accuracy
- Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
- Add Server invocation
- Add inference interruption interface
- Add logprob and token_id to the return value

for older version, please refer [CHANGELOG](CHANGELOG.md)
Binary file modified doc/Rockchip_RKLLM_SDK_CN.pdf
Binary file not shown.
Binary file added doc/Rockchip_RKLLM_SDK_EN.pdf
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,14 @@ set(SOURCE_FILES src/main.cpp)

add_executable(${PROJECT_NAME} ${SOURCE_FILES})

set(RKLLM_API_PATH "${CMAKE_SOURCE_DIR}/../runtime/${CMAKE_SYSTEM_NAME}/librkllm_api")
set(RKLLM_API_PATH "${CMAKE_SOURCE_DIR}/../../runtime/${CMAKE_SYSTEM_NAME}/librkllm_api")
include_directories(${RKLLM_API_PATH}/include)
if(CMAKE_SYSTEM_NAME STREQUAL "Android")
set(RKLLM_RT_LIB ${RKLLM_API_PATH}/${CMAKE_ANDROID_ARCH_ABI}/librkllmrt.so)
target_link_libraries(${PROJECT_NAME} ${RKLLM_RT_LIB} log)
elseif(CMAKE_SYSTEM_NAME STREQUAL "Linux")
set(RKLLM_RT_LIB ${RKLLM_API_PATH}/aarch64/librkllmrt.so)
target_link_libraries(${PROJECT_NAME} ${RKLLM_RT_LIB})
endif()


target_link_libraries(${PROJECT_NAME} ${RKLLM_RT_LIB})
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ bash build-linux.sh
Push the compiled `llm_demo` file and `librkllmrt.so` file to the device:
```bash
adb push build/build_linux_aarch64_Release/llm_demo /userdata/llm
adb push ../runtime/Linux/librkllm_api/aarch64/librkllmrt.so /userdata/llm/lib
adb push ../../runtime/Linux/librkllm_api/aarch64/librkllmrt.so /userdata/llm/lib
```

## Run
Expand All @@ -39,7 +39,7 @@ bash build-android.sh
Push the compiled `llm_demo` file and `librkllmrt.so` file to the device:
```bash
adb push build/build_android_arm64-v8a_Release/llm_demo /userdata/llm
adb push ../runtime/Android/librkllm_api/arm64-v8a/librkllmrt.so /userdata/llm/lib
adb push ../../runtime/Android/librkllm_api/arm64-v8a/librkllmrt.so /userdata/llm/lib
```

## Run
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ if [[ -z ${BUILD_TYPE} ]];then
BUILD_TYPE=Release
fi

ANDROID_NDK_PATH=~/android-ndk-r18b
ANDROID_NDK_PATH=~/android-ndk-r21e
TARGET_ARCH=arm64-v8a

TARGET_PLATFORM=android
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,8 @@ void exit_handler(int signal)
}
}

void callback(const char *text, void *userdata, LLMCallState state)
void callback(RKLLMResult *result, void *userdata, LLMCallState state)
{

if (state == LLM_RUN_FINISH)
{
printf("\n");
Expand All @@ -52,8 +51,9 @@ void callback(const char *text, void *userdata, LLMCallState state)
{
printf("\\run error\n");
}
else{
printf("%s", text);
else
{
printf("%s", result->text);
}
}

Expand All @@ -70,12 +70,14 @@ int main(int argc, char **argv)

//设置参数及初始化
RKLLMParam param = rkllm_createDefaultParam();
param.modelPath = rkllm_model.c_str();
param.target_platform = "rk3588";
param.model_path = rkllm_model.c_str();
param.num_npu_core = 2;
param.top_k = 1;
param.max_new_tokens = 256;
param.max_context_len = 512;
param.logprobs = false;
param.top_logprobs = 5;
param.use_gpu = false;
rkllm_init(&llmHandle, param, callback);
printf("rkllm init success\n");

Expand Down Expand Up @@ -113,7 +115,9 @@ int main(int argc, char **argv)
cout << input_str << endl;
}
}
string text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX;
// string text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX;
string text = input_str;

printf("robot: ");
rkllm_run(llmHandle, text.c_str(), NULL);
}
Expand Down
30 changes: 30 additions & 0 deletions rkllm-runtime/examples/rkllm_server_demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# RKLLM-Server Demo
## Before Run
Before running the demo, you need to prepare the following files:
- The transformed RKLLM model file in board.
- check the IP address of the board with 'ifconfig' command.

## RKLLM-Server-Flask Demo
### Build
You can run the demo with the only command:
```bash
# ./build_rkllm_server_flask.sh [target_platform:rk3588/rk3576] [RKLLM-Server workshop] [transformed_rkllm_model_path in borad]
./build_rkllm_server_flask.sh rk3588 /user/data/rkllm_server /user/data/rkllm_server/model.rkllm
```
### Access with API
After building the RKLLM-Server-Flask, You can use ‘chat_api_flask.py’ to access the RKLLM-Server-Flask and get the answser of RKLLM models.

Attention: you should check the IP address of the board with 'ifconfig' command and replace the IP address in the ‘chat_api_flask.py’.

## RKLLM-Server-Gradio Demo
### Build
You can run the demo with the only command:
```bash
# ./build_rkllm_server_gradio.sh [target_platform:rk3588/rk3576] [RKLLM-Server workshop] [transformed_rkllm_model_path in borad]
./build_rkllm_server_gradio.sh rk3588 /user/data/rkllm_server /user/data/rkllm_server/model.rkllm
```
### Access the Server
After running the demo, You can access the RKLLM-Server-Gradio with two ways:
1. Just Start your browser and access the URL: ‘http://[board_ip]:8080/’. You can chat with the RKLLM models in visual interface.
2. Use the 'chat_api_gradio.py'(you need fix the IP address in the code previously) and get the answser of RKLLM models.

Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/bin/bash

#*****************************************************************************************#
# 该脚本为 RKLLM-Server-Flask 服务的一键设置脚本
# 用户可以运行该脚本实现Linux板端的 RKLLM-Server-Flask 服务的自动化部署。
# 使用说明: ./build_rkllm_server_flask.sh [目标平台:rk3588/rk3576] [RKLLM-Server工作路径] [已转换的rkllm模型在板端的绝对路径]
# example: ./build_rkllm_server_flask.sh rk3588 /user/data/rkllm_server /user/data/rkllm_server/model.rkllm
#*****************************************************************************************#

#################### 检查板端是否已经安装了 pip/gradio 库 ####################
# 1.准备板端的gradio环境
adb shell << EOF
# 检查是否安装了 pip3
if ! command -v pip3 &> /dev/null; then
echo "-------- pip3 未安装,将进行安装... --------"
# 安装 pip3
sudo apt update
sudo apt install python3-pip -y
else
echo "-------- pip3 已经安装 --------"
fi
# 检查是否安装了 flask
if ! python3 -c "import flask" &> /dev/null; then
echo "-------- flask 未安装,将进行安装... --------"
# 安装 flask
pip install flask==2.2.2 Werkzeug==2.2.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
else
echo "-------- flask 已经安装 --------"
fi
exit
EOF

#################### 推送 server 运行的相关文件进入板端 ####################
# 2.检查需要推送进板端的路径是否存在
adb shell ls $2 > /dev/null 2>&1
if [ $? -ne 0 ]; then
# 如果路径不存在,则创建路径
adb shell mkdir -p $2
echo "-------- rkllm_server 工作目录不存在,已创建目录 --------"
else
echo "-------- rkllm_server 工作目录已存在 --------"
fi

# 3.更新 ./rkllm_server/lib 中的 librkllmrt.so 文件
cp ../../runtime/Linux/librkllm_api/aarch64/librkllmrt.so ./rkllm_server/lib/

# 4.推送文件到 Linux 板端
adb push ./rkllm_server $2

#################### 进入板端并启动 server 服务 ####################
# 5.进入板端启动 server 服务
adb shell << EOF
cd $2/rkllm_server/
python3 flask_server.py --target_platform $1 --rkllm_model_path $3
EOF
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/bin/bash

#*****************************************************************************************#
# 该脚本为 RKLLM-Server-Gradio 服务的一键设置脚本
# 用户可以运行该脚本实现Linux板端的 RKLLM-Server-Gradio 服务的自动化部署。
# 使用说明: ./build_rkllm_server_gradio.sh [目标平台:rk3588/rk3576] [RKLLM-Server工作路径] [已转换的rkllm模型在板端的绝对路径]
# example: ./build_rkllm_server_gradio.sh rk3588 /user/data/rkllm_server /user/data/rkllm_server/model.rkllm
#*****************************************************************************************#

#################### 检查板端是否已经安装了 pip/gradio 库 ####################
# 1.准备板端的gradio环境
adb shell << EOF
# 检查是否安装了 pip3
if ! command -v pip3 &> /dev/null; then
echo "-------- pip3 未安装,将进行安装... --------"
# 安装 pip3
sudo apt update
sudo apt install python3-pip -y
else
echo "-------- pip3 已经安装 --------"
fi
# 检查是否安装了 gradio
if ! python3 -c "import gradio" &> /dev/null; then
echo "-------- Gradio 未安装,将进行安装... --------"
# 安装 Gradio
pip3 install gradio>=4.24.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/
else
echo "-------- Gradio 已经安装 --------"
fi
exit
EOF

#################### 推送 server 运行的相关文件进入板端 ####################
# 2.检查需要推送进板端的路径是否存在
adb shell ls $2 > /dev/null 2>&1
if [ $? -ne 0 ]; then
# 如果路径不存在,则创建路径
adb shell mkdir -p $2
echo "-------- rkllm_server 工作目录不存在,已创建目录 --------"
else
echo "-------- rkllm_server 工作目录已存在 --------"
fi

# 3.更新 ./rkllm_server/lib 中的 librkllmrt.so 文件
cp ../../runtime/Linux/librkllm_api/aarch64/librkllmrt.so ./rkllm_server/lib/

# 4.推送文件到 Linux 板端
adb push ./rkllm_server $2

#################### 进入板端并启动 server 服务 ####################
# 5.进入板端启动 server 服务
adb shell << EOF
cd $2/rkllm_server/
python3 gradio_server.py --target_platform $1 --rkllm_model_path $3
EOF
Loading

0 comments on commit d59d017

Please sign in to comment.