We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1. 环境(environment)
2. Github版本
3. 编译方式(compile method)
4. 麒麟芯片(HUAWEI P9,芯片Hisilicon Kirin 955)
./test/libTNNBenchmarkTest.so: 1 file pushed, 0 skipped. 1326.2 MB/s (230616 bytes in 0.000s) ./libTNN.so: 1 file pushed, 0 skipped. 31.6 MB/s (7525920 bytes in 0.227s) test/TNNTest: 1 file pushed, 0 skipped. 360.1 MB/s (192504 bytes in 0.001s) /home/liyang/GitHub/TNN/benchmark/benchmark_android/../benchmark-model/: 18 files pushed, 0 skipped. 3.6 MB/s (283496 bytes in 0.074s) /data/local/tmp/tnn-benchmark/benchmark_models_result.txt: 1 file pulled, 0 skipped. 10.3 MB/s (52083 bytes in 0.005s) EVA-AL10 benchmark device: ARM Summary -------------------------------------------------------- | Op Type | Total Kernel Time(ms) | Percent (%) | -------------------------------------------------------- | Convolution | 7.473 | 86.476 | | StridedSlice | 0.451 | 5.215 | | Pooling | 0.262 | 3.034 | | BatchNormCxx | 0.194 | 2.244 | | ShuffleChannel | 0.134 | 1.546 | | Concat | 0.128 | 1.486 | -------------------------------------------------------- kernel runtime total: 8.64113 ms I/tnn: void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] shufflenet_v2.tnnproto - ARM TNN Benchmark time cost: min = 9.404 ms | max = 9.596 ms | avg = 9.474 ms 08-11 15:55:58.555 18785 18785 I tnn : void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] shufflenet_v2.tnnproto - ARM TNN Benchmark time cost: min = 9.404 ms | max = 9.596 ms | avg = 9.474 ms benchmark device: OPENCL I/tnn: tnn::Status tnn::OpenCLRuntime::Init() [File /home/liyang/GitHub/TNN/source/tnn/device/opencl/opencl_runtime.cc][Line 120] OpenCL version: CL_TARGET_OPENCL_VERSION 200 CL_HPP_TARGET_OPENCL_VERSION 110 CL_HPP_MINIMUM_OPENCL_VERSION 110 I/tnn: tnn::Status tnn::OpenCLRuntime::Init() [File /home/liyang/GitHub/TNN/source/tnn/device/opencl/opencl_runtime.cc][Line 155] Create common opencl context Summary -------------------------------------------------------- | Op Type | Total Kernel Time(ms) | Percent (%) | -------------------------------------------------------- | Conv_1x1 | 5.047 | 49.808 | | Conv_3x3 | 1.331 | 13.131 | | Conv_Depthwise | 1.110 | 10.951 | | ShuffleChannel | 0.956 | 9.437 | | Concat | 0.680 | 6.705 | | StrideSlice | 0.556 | 5.491 | | Pooling | 0.320 | 3.157 | | BatchNorm | 0.134 | 1.318 | -------------------------------------------------------- kernel runtime total: 10.1338 ms I/tnn: void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] shufflenet_v2.tnnproto - OPENCL TNN Benchmark time cost: min = 32.275 ms | max = 47.919 ms | avg = 38.150 ms 08-11 15:56:06.504 18802 18802 I tnn : void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] shufflenet_v2.tnnproto - OPENCL TNN Benchmark time cost: min = 32.275 ms | max = 47.919 ms | avg = 38.150 ms
./test/libTNNBenchmarkTest.so: 1 file pushed, 0 skipped. 776.0 MB/s (230616 bytes in 0.000s) ./libTNN.so: 1 file pushed, 0 skipped. 22.5 MB/s (7525920 bytes in 0.320s) test/TNNTest: 1 file pushed, 0 skipped. 357.9 MB/s (192504 bytes in 0.001s) /home/liyang/GitHub/TNN/benchmark/benchmark_android/../benchmark-model/: 18 files pushed, 0 skipped. 4.4 MB/s (283496 bytes in 0.062s) /data/local/tmp/tnn-benchmark/benchmark_models_result.txt: 1 file pulled, 0 skipped. 10.8 MB/s (123124 bytes in 0.011s) EVA-AL10 benchmark device: ARM Summary ------------------------------------------------------ | Op Type | Total Kernel Time(ms) | Percent (%) | ------------------------------------------------------ | Convolution | 171.766 | 75.569 | | PReLU | 9.712 | 4.273 | | Upsample | 8.760 | 3.854 | | Concat | 8.670 | 3.814 | | Add | 8.004 | 3.522 | | Pooling | 6.693 | 2.945 | | Pad | 4.063 | 1.788 | | BatchNormCxx | 3.819 | 1.680 | | SoftmaxCaffe | 3.749 | 1.649 | | SplitV | 2.059 | 0.906 | ------------------------------------------------------ kernel runtime total: 227.296 ms I/tnn: void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] portrait.tnnproto - ARM TNN Benchmark time cost: min = 227.797 ms | max = 232.323 ms | avg = 230.944 ms 08-11 16:07:09.257 18913 18913 I tnn : void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] portrait.tnnproto - ARM TNN Benchmark time cost: min = 227.797 ms | max = 232.323 ms | avg = 230.944 ms benchmark device: OPENCL I/tnn: tnn::Status tnn::OpenCLRuntime::Init() [File /home/liyang/GitHub/TNN/source/tnn/device/opencl/opencl_runtime.cc][Line 120] OpenCL version: CL_TARGET_OPENCL_VERSION 200 CL_HPP_TARGET_OPENCL_VERSION 110 CL_HPP_MINIMUM_OPENCL_VERSION 110 I/tnn: tnn::Status tnn::OpenCLRuntime::Init() [File /home/liyang/GitHub/TNN/source/tnn/device/opencl/opencl_runtime.cc][Line 155] Create common opencl context Summary -------------------------------------------------------- | Op Type | Total Kernel Time(ms) | Percent (%) | -------------------------------------------------------- | Conv_1x1 | 147.789 | 69.348 | | Conv_Depthwise | 11.484 | 5.389 | | Concat | 9.341 | 4.383 | | Pad | 8.492 | 3.985 | | BatchNorm | 8.187 | 3.842 | | PRelu | 7.954 | 3.732 | | Add | 7.828 | 3.673 | | Pooling | 3.697 | 1.735 | | Conv_3x3 | 3.306 | 1.551 | | Upsample | 3.086 | 1.448 | | SplitV | 1.397 | 0.655 | | SoftMax | 0.552 | 0.259 | -------------------------------------------------------- kernel runtime total: 213.112 ms I/tnn: void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] portrait.tnnproto - OPENCL TNN Benchmark time cost: min = 288.641 ms | max = 298.735 ms | avg = 293.257 ms 08-11 16:07:19.737 18930 18930 I tnn : void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] portrait.tnnproto - OPENCL TNN Benchmark time cost: min = 288.641 ms | max = 298.735 ms | avg = 293.257 ms
5. 骁龙芯片(小米 Mix2s,芯片骁龙845)
./test/libTNNBenchmarkTest.so: 1 file pushed, 0 skipped. 597.1 MB/s (230616 bytes in 0.000s) ./libTNN.so: 1 file pushed, 0 skipped. 61.0 MB/s (7525920 bytes in 0.118s) test/TNNTest: 1 file pushed, 0 skipped. 342.2 MB/s (192504 bytes in 0.001s) /home/liyang/GitHub/TNN/benchmark/benchmark_android/../benchmark-model/: 18 files pushed, 0 skipped. 11.6 MB/s (283496 bytes in 0.023s) E/tnn: tnn::Status tnn::OpenCLRuntime::Init() [File /home/liyang/GitHub/TNN/source/tnn/device/opencl/opencl_runtime.cc][Line 187] load program cache skipped, ret: 40966, msg: code: 0xA006 msg: open program cache file failed, input path: /data/local/tmp//d1_tnn_ocl_fd8c6f613ff9c0d503dbc462bf21353f_abc87b1bd5bec928c91c17fc45884487 /data/local/tmp/tnn-benchmark/benchmark_models_result.txt: 1 file pulled, 0 skipped. 24.3 MB/s (122071 bytes in 0.005s) MIX 2S benchmark device: ARM Summary ------------------------------------------------------ | Op Type | Total Kernel Time(ms) | Percent (%) | ------------------------------------------------------ | Convolution | 149.520 | 82.845 | | Upsample | 4.961 | 2.749 | | Pooling | 4.927 | 2.730 | | PReLU | 4.814 | 2.667 | | Add | 4.553 | 2.523 | | Concat | 3.513 | 1.947 | | SoftmaxCaffe | 2.735 | 1.515 | | Pad | 2.203 | 1.221 | | BatchNormCxx | 2.183 | 1.210 | | SplitV | 1.072 | 0.594 | ------------------------------------------------------ kernel runtime total: 180.481 ms I/tnn: void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] portrait.tnnproto - ARM TNN Benchmark time cost: min = 180.755 ms | max = 185.242 ms | avg = 183.615 ms 08-11 16:14:20.985 19859 19859 I tnn : void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] portrait.tnnproto - ARM TNN Benchmark time cost: min = 180.755 ms | max = 185.242 ms | avg = 183.615 ms benchmark device: OPENCL I/tnn: tnn::Status tnn::OpenCLRuntime::Init() [File /home/liyang/GitHub/TNN/source/tnn/device/opencl/opencl_runtime.cc][Line 120] OpenCL version: CL_TARGET_OPENCL_VERSION 200 CL_HPP_TARGET_OPENCL_VERSION 110 CL_HPP_MINIMUM_OPENCL_VERSION 110 I/tnn: tnn::Status tnn::OpenCLRuntime::Init() [File /home/liyang/GitHub/TNN/source/tnn/device/opencl/opencl_runtime.cc][Line 155] Create common opencl context Summary -------------------------------------------------------- | Op Type | Total Kernel Time(ms) | Percent (%) | -------------------------------------------------------- | Conv_1x1 | 19.559 | 61.558 | | Conv_Depthwise | 3.476 | 10.940 | | Concat | 2.204 | 6.936 | | Add | 1.651 | 5.197 | | PRelu | 1.597 | 5.026 | | Pooling | 0.777 | 2.445 | | Pad | 0.718 | 2.261 | | BatchNorm | 0.562 | 1.769 | | Upsample | 0.549 | 1.727 | | Conv_3x3 | 0.372 | 1.172 | | SplitV | 0.241 | 0.759 | | SoftMax | 0.067 | 0.211 | -------------------------------------------------------- kernel runtime total: 31.7729 ms I/tnn: void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] portrait.tnnproto - OPENCL TNN Benchmark time cost: min = 34.534 ms | max = 38.537 ms | avg = 36.155 ms 08-11 16:14:34.795 19885 19885 I tnn : void tnn::test::Timer::Print() [File /home/liyang/GitHub/TNN/test/timer.cc][Line 60] portrait.tnnproto - OPENCL TNN Benchmark time cost: min = 34.534 ms | max = 38.537 ms | avg = 36.155 ms
5. 该如何优化麒麟芯片上小网络的推理性能
The text was updated successfully, but these errors were encountered:
Sorry, something went wrong.
@MHGL 反馈的速度问题跟测试的机型相关,不是麒麟处理器上的GPU通用问题,选取的Kirin 955机器,CPU配置是四核Cortex A72+四核Cortex A53,GPU配置是Mali-T880 MP4,Mali-T880 MP4相比A72性能优势不大,如果要充分发挥GPU的速度优势,可以拿Kirin 970/980(Mali-G架构的GPU)去在项目上做落地
@lnmdlong 非常感谢你的回复! 在这个性能测试文件中有关Kirin 970的测试数据中发现,小型网络如ShuffleNet,SqueezeNet都有体现出CPU性能优于GPU;所以我的问题是该如何针对性的优化小网络TNN模型在麒麟芯片上的表现呢?有具体的华为部署TNN流程吗?谢谢
@MHGL 小型网络在麒麟芯片上的GPU性能TNN做了一些优化,部分模型性能不如CPU,跟模型结构和硬件特性相关,暂时还没有进一步优化的方案,后续有计划会及时同步;部署流程可以参考TNN的demo,https://github.com/Tencent/TNN/blob/master/doc/en/user/demo_en.md#ii-introduction-to-android-demo
lnmdlong
No branches or pull requests
1. 环境(environment)
2. Github版本
3. 编译方式(compile method)
4. 麒麟芯片(HUAWEI P9,芯片Hisilicon Kirin 955)
5. 骁龙芯片(小米 Mix2s,芯片骁龙845)
5. 该如何优化麒麟芯片上小网络的推理性能
The text was updated successfully, but these errors were encountered: