Release ZenDNN Release v4.2 · amd/ZenDNN

The highlights of this release are as follows

The ZenDNN library is based on oneDNN v2.6.3, and provides optimizations tailored to enable performant AI inference on AMD EPYC^TM servers.
The ZenDNN library can be used in the following frameworks through a plug-in:
- TensorFlow v2.16 and later
- PyTorch v2.0 and later
The ZenDNN library is integrated with ONNX Runtime v1.17.0.
Supports Environment Variables for Tuning Performance
The following environment variables have been added to tune performance:
- Memory Pooling (Persistent Memory Caching)
  - ZENDNN_ENABLE_MEMPOOL for all TensorFlow models
  - Added MEMPOOL support for BF16 models in TensorFlow models
- Convolution Operation
  - ZENDNN_CONV_ALGO for all TensorFlow models
  - Added new options to ALGO paths
- Matrix Multiplication Operation
  - ZENDNN_MATMUL_ALGO for TensorFlow, PyTorch, and ONNX Runtime models
  - Added new options, ALGO paths, and an experimental version of auto-tuner for TensorFlow
Embedding Bag and Embedding Operators
- Support for Embedding operator
- AVX512 support for Embedding and Embedding Bag kernel
- Two new parallelization strategies for Embedding and Embedding bag operators, namely, Table threading and Hierarchical threading
Matrix Multiplication (MatMul) Operators
- MatMul post-ops computation with BLIS kernels
- Weight caching for FP32 JIT and BLIS kernels
- BLIS BF16 kernel support

Provide feedback