Skip to content

Latest commit

 

History

History
62 lines (43 loc) · 2.32 KB

File metadata and controls

62 lines (43 loc) · 2.32 KB

framework


Introduction

The latency predictor based on Conv operations profiler, supporting Nvidia GPUs.

Thanks to the sequential structure of the neural network, we can approximate the latency of the model by summing up the latency of each layer.

First build a operations latency library containing the latency of sampling Conv operations under a large configurations.

Then use the predictor.py to predict the summary latency of each layer in the target network structure. The operation which is not in the library is obtained by Scipy Interpolations between other operations.


Major features
  • Sampling process with python example

    python: the code how to generate the operations latency library on target device. How to use.

  • Nvidia GPUs TRT latency

    V100: the sampling operations latency library for Nvidia GPU V100 based TRT with precision of FP32/FP16/INT8

    Other GPUs like P100/T4, could use the sampling code to generate the library like V100.


Format for each element in the library

{Conv_type, Batch, In_C, In_H, In_W, Out_C, Kernel, Stride, ElmtFused} Latency
  • Conv_type: 'Regular' or 'Depthwise' convolution
  • Batch: batch size, typical = 1, 32, 64, 128
  • In_C: input_channels; Out_C: output_channels; In_C=Out_C*Ratio
  • In_H: input_height; In_W: input_width
  • Kernel: kernel size, typical = 1, 3, 5, 7
  • Stride: stride value for convolution, typical = 1, 2
  • ElmtFused: whether the elementwise sum operation of the relink structure is included
  • Latency: the profiling time of each element

Format for each element in the predictor

[("Regular", self.stride, elmtfused, self.kernel_size, 1, self.in_channels, input_resolution, self.out_channels)]

[Conv_type, Stride, ElmtFused, Kernel, Batch, In_C, In_H, Out_C]
  • Conv_type: 'Regular' or 'Depthwise'
  • Stride: stride value for convolution, typical = 1, 2
  • ElmtFused: whether the elementwise sum operation of the relink structure is included
  • Kernel: kernel size, typical = 1, 3, 5, 7
  • Batch: 1, specific batchsize in a hyper parameter for the predictor
  • In_C: input_channels; Out_C: output_channels
  • In_H: the width the feature map is equal to the height