release v1.0.1

airockchip · May 9, 2024 · d59d017 · d59d017
1 parent b81deb2
commit d59d017
Show file tree

Hide file tree

Showing 27 changed files with 1,141 additions and 77 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,17 @@
+# CHANGELOG
+## v1.0.1
+ - Optimize model conversion memory occupation
+ - Optimize inference memory occupation
+ - Increase prefill speed
+ - Reduce initialization time
+ - Improve quantization accuracy
+ - Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
+ - Add Server invocation
+ - Add inference interruption interface
+ - Add logprob and token_id to the return value
+
+## v1.0.0
+ - Supports the conversion and deployment of LLM models on RK3588/RK3576 platforms
+ - Compatible with Hugging Face model architectures
+ - Currently supports the models Llama, Qwen, Qwen2, and Phi-2
+ - Supports quantization with w8a8 and w4a16 precision
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,64 @@
+Copyright (c) Rockchip Electronics Co., Ltd.
+All rights reserved.
+
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+//
+// 1. Redistributions of source code must retain the above copyright notice,
+// this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright notice,
+// this list of conditions and the following disclaimer in the documentation
+// and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the copyright holder nor the names of its contributors
+// may be used to endorse or promote products derived from this software without
+// specific prior written permission.
+//
+// 4. This Software may contain some Open Source Software. You may not redistribute 
+// and/or modify such Open Source Software except in compliance with the applicable 
+// Open Source License.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+
+The following Open Source Software have been modified by Rockchip Electronics Co., Ltd. 
+----------------------------------------------------------------------------------------
+1. ggml  master
+Copyright (c) 2023-2024 The ggml authors
+All rights reserved.
+Licensed under the terms of the MIT License
+
+2. llama.cpp  master
+Copyright (c) 2023-2024 The ggml authors
+All rights reserved.
+Licensed under the terms of the MIT License 
+
+The terms of the MIT License:
+--------------------------------------------------------------------
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -17,23 +17,35 @@
   - RK3588 Series
   - RK3576 Series
 
+# Support Models
+  - [X] [TinyLLAMA 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/tree/fe8a4ea1ffedaf415f4da2f062534de366a451e6) 
+  - [X] [Qwen 1.8B](https://huggingface.co/Qwen/Qwen-1_8B-Chat/tree/1d0f68de57b88cfde81f3c3e537f24464d889081)
+  - [X] [Qwen2 0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B/tree/8f445e3628f3500ee69f24e1303c9f10f5342a39)
+  - [X] [Phi-2 2.7B](https://hf-mirror.com/microsoft/phi-2/tree/834565c23f9b28b96ccbeabe614dd906b6db551a)
+  - [X] [Phi-3 3.8B](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/tree/291e9e30e38030c23497afa30f3af1f104837aa6)
+  - [X] [ChatGLM3 6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405)
+  - [X] [Gemma 2B](https://huggingface.co/google/gemma-2b-it/tree/de144fb2268dee1066f515465df532c05e699d48)
+  - [X] [InternLM2 1.8B](https://huggingface.co/internlm/internlm2-chat-1_8b/tree/ecccbb5c87079ad84e5788baa55dd6e21a9c614d)
+  - [X] [MiniCPM 2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16/tree/79fbb1db171e6d8bf77cdb0a94076a43003abd9e)
+
 # Download
 - You can also download all packages, docker image, examples, docs and platform-tools from [RKLLM_SDK](https://console.zbox.filez.com/l/RJJDmB), fetch code: rkllm
 
 # RKNN Toolkit2
-If you want to deploy additional AI model, we have introduced a new SDK called RKNN-Toolkit2. For details, please refer to:
+If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:
 
 https://github.com/airockchip/rknn-toolkit2
 
-# Notes
-
-Due to recent updates to the Phi2 model, the current version of the RKLLM SDK does not yet support these changes. 
-Please ensure to download a version of the [Phi2](https://hf-mirror.com/microsoft/phi-2/tree/834565c23f9b28b96ccbeabe614dd906b6db551a) model that is supported. 
-
 # CHANGELOG
-
-## v1.0.0-beta
- - Supports the conversion and deployment of LLM models on RK3588/RK3576 platforms
- - Compatible with Hugging Face model architectures
- - Currently supports the models LLaMA, Qwen, Qwen2, and Phi-2
- - Supports quantization with w8a8 and w4a16 precision
+## v1.0.1
+ - Optimize model conversion memory occupation
+ - Optimize inference memory occupation
+ - Increase prefill speed
+ - Reduce initialization time
+ - Improve quantization accuracy
+ - Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
+ - Add Server invocation
+ - Add inference interruption interface
+ - Add logprob and token_id to the return value
+
+for older version, please refer [CHANGELOG](CHANGELOG.md)
diff --git a/doc/Rockchip_RKLLM_SDK_CN.pdf b/doc/Rockchip_RKLLM_SDK_CN.pdf
diff --git a/doc/Rockchip_RKLLM_SDK_EN.pdf b/doc/Rockchip_RKLLM_SDK_EN.pdf
diff --git a/rkllm-runtime/example/CMakeLists.txt → ...me/examples/rkllm_api_demo/CMakeLists.txt b/rkllm-runtime/example/CMakeLists.txt → ...me/examples/rkllm_api_demo/CMakeLists.txt
@@ -8,13 +8,14 @@ set(SOURCE_FILES src/main.cpp)
 
 add_executable(${PROJECT_NAME} ${SOURCE_FILES})
 
-set(RKLLM_API_PATH "${CMAKE_SOURCE_DIR}/../runtime/${CMAKE_SYSTEM_NAME}/librkllm_api")
+set(RKLLM_API_PATH "${CMAKE_SOURCE_DIR}/../../runtime/${CMAKE_SYSTEM_NAME}/librkllm_api")
 include_directories(${RKLLM_API_PATH}/include)
 if(CMAKE_SYSTEM_NAME STREQUAL "Android")
     set(RKLLM_RT_LIB ${RKLLM_API_PATH}/${CMAKE_ANDROID_ARCH_ABI}/librkllmrt.so)
+    target_link_libraries(${PROJECT_NAME}  ${RKLLM_RT_LIB} log)
 elseif(CMAKE_SYSTEM_NAME STREQUAL "Linux")
     set(RKLLM_RT_LIB ${RKLLM_API_PATH}/aarch64/librkllmrt.so)
+    target_link_libraries(${PROJECT_NAME}  ${RKLLM_RT_LIB})
 endif()
 
 
-target_link_libraries(${PROJECT_NAME}  ${RKLLM_RT_LIB})
diff --git a/rkllm-runtime/example/Readme.md → ...runtime/examples/rkllm_api_demo/Readme.md b/rkllm-runtime/example/Readme.md → ...runtime/examples/rkllm_api_demo/Readme.md
@@ -13,7 +13,7 @@ bash build-linux.sh
 Push the compiled `llm_demo` file and `librkllmrt.so` file to the device:
 ```bash
 adb push build/build_linux_aarch64_Release/llm_demo /userdata/llm
-adb push ../runtime/Linux/librkllm_api/aarch64/librkllmrt.so /userdata/llm/lib
+adb push ../../runtime/Linux/librkllm_api/aarch64/librkllmrt.so /userdata/llm/lib
 ```
 
 ## Run
@@ -39,7 +39,7 @@ bash build-android.sh
 Push the compiled `llm_demo` file and `librkllmrt.so` file to the device:
 ```bash
 adb push build/build_android_arm64-v8a_Release/llm_demo /userdata/llm
-adb push ../runtime/Android/librkllm_api/arm64-v8a/librkllmrt.so /userdata/llm/lib
+adb push ../../runtime/Android/librkllm_api/arm64-v8a/librkllmrt.so /userdata/llm/lib
 ```
 
 ## Run

diff --git a/rkllm-runtime/example/build-android.sh → .../examples/rkllm_api_demo/build-android.sh b/rkllm-runtime/example/build-android.sh → .../examples/rkllm_api_demo/build-android.sh
@@ -4,7 +4,7 @@ if [[ -z ${BUILD_TYPE} ]];then
     BUILD_TYPE=Release
 fi
 
-ANDROID_NDK_PATH=~/android-ndk-r18b
+ANDROID_NDK_PATH=~/android-ndk-r21e
 TARGET_ARCH=arm64-v8a
 
 TARGET_PLATFORM=android

diff --git a/rkllm-runtime/example/build-linux.sh → ...me/examples/rkllm_api_demo/build-linux.sh b/rkllm-runtime/example/build-linux.sh → ...me/examples/rkllm_api_demo/build-linux.sh
diff --git a/rkllm-runtime/example/src/main.cpp → ...time/examples/rkllm_api_demo/src/main.cpp b/rkllm-runtime/example/src/main.cpp → ...time/examples/rkllm_api_demo/src/main.cpp
@@ -41,9 +41,8 @@ void exit_handler(int signal)
     }
 }
 
-void callback(const char *text, void *userdata, LLMCallState state)
+void callback(RKLLMResult *result, void *userdata, LLMCallState state)
 {
-
     if (state == LLM_RUN_FINISH)
     {
         printf("\n");
@@ -52,8 +51,9 @@ void callback(const char *text, void *userdata, LLMCallState state)
     {
         printf("\\run error\n");
     }
-    else{
-        printf("%s", text);
+    else
+    {
+        printf("%s", result->text);
     }
 }
 
@@ -70,12 +70,14 @@ int main(int argc, char **argv)
 
     //设置参数及初始化
     RKLLMParam param = rkllm_createDefaultParam();
-    param.modelPath = rkllm_model.c_str();
-    param.target_platform = "rk3588";
+    param.model_path = rkllm_model.c_str();
     param.num_npu_core = 2;
     param.top_k = 1;
     param.max_new_tokens = 256;
     param.max_context_len = 512;
+    param.logprobs = false;
+    param.top_logprobs = 5;
+    param.use_gpu = false;
     rkllm_init(&llmHandle, param, callback);
     printf("rkllm init success\n");
 
@@ -113,7 +115,9 @@ int main(int argc, char **argv)
                 cout << input_str << endl;
             }
         }
-        string text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX;
+        // string text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX;
+        string text = input_str;
+
         printf("robot: ");
         rkllm_run(llmHandle, text.c_str(), NULL);
     }

diff --git a/rkllm-runtime/examples/rkllm_server_demo/README.md b/rkllm-runtime/examples/rkllm_server_demo/README.md
@@ -0,0 +1,30 @@
+# RKLLM-Server Demo
+## Before Run
+Before running the demo, you need to prepare the following files:
+- The transformed RKLLM model file in board.
+- check the IP address of the board with 'ifconfig' command.
+
+## RKLLM-Server-Flask Demo
+### Build
+You can run the demo with the only command:
+```bash
+# ./build_rkllm_server_flask.sh [target_platform:rk3588/rk3576] [RKLLM-Server workshop] [transformed_rkllm_model_path in borad]
+./build_rkllm_server_flask.sh rk3588 /user/data/rkllm_server /user/data/rkllm_server/model.rkllm
+```
+### Access with API 
+After building the RKLLM-Server-Flask, You can use ‘chat_api_flask.py’ to access the RKLLM-Server-Flask and get the answser of RKLLM models.
+
+Attention: you should check the IP address of the board with 'ifconfig' command and replace the IP address in the ‘chat_api_flask.py’.
+
+## RKLLM-Server-Gradio Demo
+### Build
+You can run the demo with the only command:
+```bash
+# ./build_rkllm_server_gradio.sh [target_platform:rk3588/rk3576] [RKLLM-Server workshop] [transformed_rkllm_model_path in borad]
+./build_rkllm_server_gradio.sh rk3588 /user/data/rkllm_server /user/data/rkllm_server/model.rkllm
+```
+### Access the Server
+After running the demo, You can access the RKLLM-Server-Gradio with two ways:
+1. Just Start your browser and access the URL: ‘http://[board_ip]:8080/’. You can chat with the RKLLM models in visual interface.
+2. Use the 'chat_api_gradio.py'(you need fix the IP address in the code previously) and get the answser of RKLLM models.
+
diff --git a/rkllm-runtime/examples/rkllm_server_demo/build_rkllm_server_flask.sh b/rkllm-runtime/examples/rkllm_server_demo/build_rkllm_server_flask.sh
@@ -0,0 +1,61 @@
+#!/bin/bash
+
+#*****************************************************************************************#
+# 该脚本为 RKLLM-Server-Flask 服务的一键设置脚本
+# 用户可以运行该脚本实现Linux板端的 RKLLM-Server-Flask 服务的自动化部署。
+# 使用说明: ./build_rkllm_server_flask.sh [目标平台:rk3588/rk3576] [RKLLM-Server工作路径] [已转换的rkllm模型在板端的绝对路径]
+# example: ./build_rkllm_server_flask.sh rk3588 /user/data/rkllm_server /user/data/rkllm_server/model.rkllm
+#*****************************************************************************************#
+
+#################### 检查板端是否已经安装了 pip/gradio 库 ####################
+# 1.准备板端的gradio环境
+adb shell << EOF
+
+# 检查是否安装了 pip3
+if ! command -v pip3 &> /dev/null; then
+    echo "-------- pip3 未安装，将进行安装... --------"
+    # 安装 pip3
+    sudo apt update
+    sudo apt install python3-pip -y
+else
+    echo "-------- pip3 已经安装 --------"
+fi
+
+# 检查是否安装了 flask
+if ! python3 -c "import flask" &> /dev/null; then
+    echo "-------- flask 未安装，将进行安装... --------"
+    # 安装 flask
+    pip install flask==2.2.2 Werkzeug==2.2.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
+else
+    echo "-------- flask 已经安装 --------"
+fi
+
+exit
+
+EOF
+
+#################### 推送 server 运行的相关文件进入板端 ####################
+# 2.检查需要推送进板端的路径是否存在
+adb shell ls $2 > /dev/null 2>&1
+if [ $? -ne 0 ]; then
+    # 如果路径不存在，则创建路径
+    adb shell mkdir -p $2
+    echo "-------- rkllm_server 工作目录不存在，已创建目录 --------"
+else
+    echo "-------- rkllm_server 工作目录已存在 --------"
+fi
+
+# 3.更新 ./rkllm_server/lib 中的 librkllmrt.so 文件
+cp ../../runtime/Linux/librkllm_api/aarch64/librkllmrt.so  ./rkllm_server/lib/
+
+# 4.推送文件到 Linux 板端
+adb push ./rkllm_server $2
+
+#################### 进入板端并启动 server 服务 ####################
+# 5.进入板端启动 server 服务
+adb shell << EOF
+
+cd $2/rkllm_server/
+python3 flask_server.py --target_platform $1 --rkllm_model_path $3
+
+EOF
diff --git a/rkllm-runtime/examples/rkllm_server_demo/build_rkllm_server_gradio.sh b/rkllm-runtime/examples/rkllm_server_demo/build_rkllm_server_gradio.sh
@@ -0,0 +1,61 @@
+#!/bin/bash
+
+#*****************************************************************************************#
+# 该脚本为 RKLLM-Server-Gradio 服务的一键设置脚本
+# 用户可以运行该脚本实现Linux板端的 RKLLM-Server-Gradio 服务的自动化部署。
+# 使用说明: ./build_rkllm_server_gradio.sh [目标平台:rk3588/rk3576] [RKLLM-Server工作路径] [已转换的rkllm模型在板端的绝对路径]
+# example: ./build_rkllm_server_gradio.sh rk3588 /user/data/rkllm_server /user/data/rkllm_server/model.rkllm
+#*****************************************************************************************#
+
+#################### 检查板端是否已经安装了 pip/gradio 库 ####################
+# 1.准备板端的gradio环境
+adb shell << EOF
+
+# 检查是否安装了 pip3
+if ! command -v pip3 &> /dev/null; then
+    echo "-------- pip3 未安装，将进行安装... --------"
+    # 安装 pip3
+    sudo apt update
+    sudo apt install python3-pip -y
+else
+    echo "-------- pip3 已经安装 --------"
+fi
+
+# 检查是否安装了 gradio
+if ! python3 -c "import gradio" &> /dev/null; then
+    echo "-------- Gradio 未安装，将进行安装... --------"
+    # 安装 Gradio
+    pip3 install gradio>=4.24.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/
+else
+    echo "-------- Gradio 已经安装 --------"
+fi
+
+exit
+
+EOF
+
+#################### 推送 server 运行的相关文件进入板端 ####################
+# 2.检查需要推送进板端的路径是否存在
+adb shell ls $2 > /dev/null 2>&1
+if [ $? -ne 0 ]; then
+    # 如果路径不存在，则创建路径
+    adb shell mkdir -p $2
+    echo "-------- rkllm_server 工作目录不存在，已创建目录 --------"
+else
+    echo "-------- rkllm_server 工作目录已存在 --------"
+fi
+
+# 3.更新 ./rkllm_server/lib 中的 librkllmrt.so 文件
+cp ../../runtime/Linux/librkllm_api/aarch64/librkllmrt.so  ./rkllm_server/lib/
+
+# 4.推送文件到 Linux 板端
+adb push ./rkllm_server $2
+
+#################### 进入板端并启动 server 服务 ####################
+# 5.进入板端启动 server 服务
+adb shell << EOF
+
+cd $2/rkllm_server/
+python3 gradio_server.py --target_platform $1 --rkllm_model_path $3
+
+EOF