In this repo, we reimplement ResNet18 with CUDA, and make optimizations to speed up inference. Experiments show that our code is faster than the pytorch version (batchsize=1).
We use im2col and winograd(4x4) to speed up convolution.
onnx模型下载链接:链接:https://pan.baidu.com/s/1eVvb2OedbnYR7m6PG-U_Cw 提取码:ksm8
1、通过get_onnx_weight.py获得weight.json
cd /home/group20/cuda_onnx_python/
conda activate onnx_env
python get_onnx_weight.py
2、通过Jsoncpp加载参数至模型中,其中/json以及/json_lib即为jsoncpp所需文件
通过pytorch搭建Resnet18并将backend设为cudnn作为baseline
cd /home/group20/git/resnet_python/try_resnet_format.py
conda activate onnx_env
python try_resnet_format.py
kernels.cu: MaxPooing AvgPooling Relu Add
GEMM: matmul.cu
winograd: conv_winograd_4x4_3x3.cu conv_winograd_gpu.cu
im2col: conv_im2col.cu
resnet_extern.cu: resnet
resnet18_main.cc: main
cd /home/group20/resnet_cuda/tmp/final_version/
make
./hello
resnet18Input.txt
resnet18Output.txt
resnet18.onnx
weight.json
Methods | time(ms) |
---|---|
Baseline (pytorch) | 2.67 |
our model | 2.26 |