v0.12
Performance optimizations
- Improved performance of fp32 direct and Winograd convolution on Intel(R) Xeon(R) processors with Intel(R) Advanced Vector Instructions 512 (Intel(R) AVX512) support
- Improved performance of int8 direct convolution on Intel Xeon processors with Intel AVX512 instruction set
- Improved batch normalization performance on Intel Xeon processors with Intel AVX512 instruction set
- Optimized dilated convolution backward propagation
- Improved initialization time of GEMM-based convolution implementations
New functionality
- Support for int8 inference. These functions support int8 data type:
- reorders (including quantization and dequantization)
- convolution
- pooling
- eltwise
- sum
- concat
- Layer fusion support with the new post-ops API. Functions that support fusion:
- forward convolution with eltwise for inference and training
- convolution with sum for inference
- batch normalization with eltwise for training
API deprecations and breaking changes
- ReLU primitive is deprecated. The functionality is a part of eltwise primitive
- Merged convolution/ReLU primitive is deprecated. The functionality is available using the new post-ops API
Thanks to the contributors
This release contains contributions from many Intel(R) Performance Libraries developers as well as @kruus, Yong Wu, Daoxin Pan, and Zhiming Wang. We would also like to thank everyone who asked questions and reported issues.
* Other names and brands may be claimed as the property of others.