Last version of TVM this was evaluated on and worked (09/26/2022): 46ea2ed42ee41225141c5ed522900d340b08944d
.
Questions of issues using the scripts? Submit a ticket via the OctoML helpdesk.
In the table below you can see the performance numbers (inference time in milliseconds) which were achieved on the Realme GT 5G.
mace_mobilenetv1_nchw | mace_resnet50_v2 | mace_inceptionv3 | mxnet_vgg16 | mace_deeplabv3 | mace_yolov3 | |
---|---|---|---|---|---|---|
TVM textures FP16 | 4,82 | 35,89 | 39,07 | 56,37 | 58,05 | 171,93 |
TVM textures FP16a32 | 5,15 | 38,12 | 40,18 | 65,14 | 61,85 | 192,15 |
TVM textures FP32 | 7,62 | 58,61 | 59,74 | 96,9 | 85,5 | 276,61 |
The tuning log files for Snapdragon 888 architecture are located in logs/ and corresponding log files for Snapdragon 8 Gen 1 are located in logs_gen1/. You
can use the evaluate.py
script for reproducing these numbers. Copy the name of
the model from the table and use the relevant log file with tuned statistic.
Below, you can see examples of mobilenetv1 evaluation with SD 888 statistics:
# float16 compute, float16 accumulate
python ./evaluate.py -m mace_mobilenetv1_nchw -t float16 -k android --target="opencl --device=adreno" -l ./logs/mace_mobilenetv1_nchw.texture.float16.acc16.autotvm.log
# float16 compute, float32 accumulate
python ./evaluate.py -m mace_mobilenetv1_nchw -t float16_acc32 -k android --target="opencl --device=adreno" -l ./logs/mace_mobilenetv1_nchw.texture.float16.acc32.autotvm.log
# float32 inference
python ./evaluate.py -m mace_mobilenetv1_nchw -t float32 -k android --target="opencl --device=adreno" -l ./logs/mace_mobilenetv1_nchw.texture.float32.autotvm.log
Refer to the below instructions for running the scripts/evaluate.py
script for more information
In the table below you can see the performance numbers for a set of models from Onnx model zoo which were achieved on the Qualcomm Snapdragon 8 Gen 1
:
onnx_ssd_resnet34 | onnx_yolo_v3 | onnx_faster_rcnn | mace_mobilenetv1_nchw | mace_resnet50_v2 | |
---|---|---|---|---|---|
TVM FP16 (GE) | 249,03 | 49,22 | 203,98 | 3,04 | 24,53 |
TVM FP16 (VM) | 500,03 | 63,63 | 208,31 | 3,11 | 24,63 |
TVM FP16a32 (GE) | 403,74 | 80,46 | 260,57 | 4,07 | 29,6857 |
TVM FP16a32 (VM) | 662,15 | 94,41 | 265,45 | 4,11 | - |
TVM FP32 (GE) | 496,67 | 80,44 | 317,15 | 4,22 | 35,46 |
TVM FP32 (VM) | 754,33 | 95,14 | 316,05 | 4,28 | 42,43 |
Virtual Machine has performance degradation only on Non Max Suppression layers, which seems to be a bug, that nevertheless affects the final performance (issue#15405).
Script evaluate.py
was extended by three models from ONNX model zoo with dynamic shape: onnx_ssd_resnet34, onnx_yolo_v3 and onnx_faster_rcnn. They can be inferred by using the following names of the models: onnx_ssd_resnet34
, onnx_yolo_v3
and onnx_faster_rcnn
.
These models can not be executed by GE, so it is necessary to pass additioal argument to evaluate.py
: --VM
, that means that the model should be inferred by using VM.
Additionaly, the script evaluate_dyn_models.py
could be used to tune and evaluate static layers from dynamic models (onnx_ssd_resnet34
, onnx_yolo_v3
, onnx_faster_rcnn
) with both Graph Executor or Virtual Machine. This script was added to compare VM performance vs GE. Below, you can see examples of onnx_ssd_resnet34
evaluation with SD 8 Gen 1 statistics:
# float16 compute, float16 accumulate
python ./evaluate_dyn_models.py -m onnx_ssd_resnet34 -t float16 -k android --target="opencl --device=adreno" -l ./logs_gen1/onnx_ssd_resnet34.float16.acc16.autotvm.log
# float16 compute, float32 accumulate
python ./evaluate_dyn_models.py -m onnx_ssd_resnet34 -t float16_acc32 -k android --target="opencl --device=adreno" -l ./logs_gen1/onnx_ssd_resnet34.float16.acc32.autotvm.log
# float32 inference
python ./evaluate_dyn_models.py -m onnx_ssd_resnet34 -t float32 -k android --target="opencl --device=adreno" -l ./logs_gen1/onnx_ssd_resnet34.float32.autotvm.log
On the host machine (typically your development box) you'll need to build TVM.
git clone https://github.com/apache/tvm.git --recursive
cd tvm
mkdir build
cp cmake/config.cmake build/.
echo 'set(USE_LLVM llvm-config)' >> build/config.cmake
echo 'set(USE_GRAPH_RUNTIME_DEBUG ON)' >> build/config.cmake
cd build
cmake ..
make -j8
cd ..
export TVM_HOME=$PWD
export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
export LD_LIBRARY_PATH=${TVM_HOME}/build:$LD_LIBRARY_PATH
Refer to the documentation here to cross compile the C++ RPC binary and tvm_runtime libraries for Android.
To run, use adb
to push the cross compiled tvm_rpc
binary and libtvm_runtime.so
shared library to /data/local/tmp
on the Android device. Then run the RPC server with:
adb shell
cd /data/local/tmp
LD_LIBRARY_PATH=. ./tvm_rpc server --tracker=<tracker IP>:<tracker port> --key=android
You also can use launch_rpc.sh
script for running tvm_rpc
on your device. To
get more information about how to use this script, run the following command:
./launch_rpc.sh --help
Typical usage is:
./launch_rpc.sh -d <device_hash> -k android -t <tracker_ip>:<tracker_port>
Once TVM is built on the host, you'll need to launch the RPC tracker service with the following command:
python -m tvm.exec.rpc_tracker --host=<tracker IP> --port=<tracker port> --port-end=9192
Where tracker IP
is the host IP, and tracker port
can be 9191
.
When done, you can register the Android device on the tracker with the same key used to run the on device RPC server:
python -m tvm.exec.rpc_server --tracker <tracker host>:<tracker port> --key android
Finally, make sure that the hardware is properly registered to the tracker. On the host, or any machine connected to the local network, check the devices registered on the tracker with the following command:
python -m tvm.exec.query_rpc_tracker --host <tracker IP> --port <tracker port>
A python script evaluate.py
can evaluate or tune a set of models.
Usage for the script, you can get with:
$ python3 ./evaluate.py -h
Example of tuning:
$ python3 ./evaluate.py -m mace_mobilenetv1_nchw -t float16 -k android --target="opencl --device=adreno" -l ./logs/mace_mobilenetv1_nchw.texture.float16.acc16.autotvm.log --tune
Example of evaluation with VM executor:
$ python3 ./evaluate.py -m mace_mobilenetv1_nchw -t float16 -k android --target="opencl --device=adreno" -l ./logs/mace_mobilenetv1_nchw.texture.float16.acc16.autotvm.log --VM
A python script evaluate_dyn_models.py
can evaluate or tune the layers from a set of dynamic models.
Usage for the script, you can get with:
$ python3 ./evaluate_dyn_models.py -h
Example of tuning:
$ python3 ./evaluate_dyn_models.py -m ssd -t float16 -k android --target="opencl --device=adreno" -l ./logs_gen1/onnx_ssd_resnet34.float16.acc16.autotvm.log --tune
Example of evaluation with VM executor:
$ python3 ./evaluate_dyn_models.py -m ssd -t float16 -k android --target="opencl --device=adreno" -l ./logs_gen1/onnx_ssd_resnet34.float16.acc16.autotvm.log --VM
In directory apps/ there are two applications which can be used for profiling separate OpenCL kernels:
-
OpenCLRunner
is an Android GUI application which can be used to run OpenCL kernels on Android device and collect performance metrics. -
OpenCLCPPRunner
is an native application which can be used to run OpenCL kernels on Android device or on a host and collect performance metrics.