Skip to content

Releases: oneapi-src/oneDNN

v1.4-rc

06 Apr 22:34
Compare
Choose a tag to compare
v1.4-rc Pre-release
Pre-release

This is a release candidate for DNNL v1.4. Please provide feedback and report bugs in Github issues.

v1.3

02 Apr 17:39
Compare
Choose a tag to compare

Performance optimizations

  • Introduced broad release quality optimizations for future Intel(R) Xeon(R) Scalable processor (code name Cooper Lake).
  • Improved performance of matmul primitive for 3D tensors (batched matrix-matrix multiplication) on all supported processors.
  • Improved performance of binary primitive for the case when one of the tensors have to be broadcasted on all supported processors.
  • Improved performance of convolution primitive for 3D tensors and 1x1 kernel size on all supported processors.

New functionality

  • Introduced fused depthwise convolution and convolution with 1x1 filter. The implementation is available for all supported processors and data types. The functionality is not implemented for Intel Processor Graphics.
  • Introduced peephole support for LSTM cell on all supported processors. The functionality is not implemented for Intel Processor Graphics.
  • Implemented matmul primitive for Intel Processors Graphics.
  • Extended binary primitive with min and max algorithms support.
  • Extended eltwise primitive:
    • Introduced erf-based implementation of gelu algorithm
    • Introduced pow algorithm
    • Introduced backpropagation flavor relying on destination tensor as input for elu, exp, logistic, relu, sqrt, and tanh algorithms
  • Extended set of operations for memory descriptors:
    *Added support for changing the number of dimensions with existing dnnl::memory::desc::reshape() method

Thanks to the contributors

This release contains contributions from the project core team as well as Araujo Mitrano, Arthur @aaraujom, Aaron Mark Johnson @aaronjohnson, Benjamin Hipple @bhipple, Sergey Nesterov @cepera, @gaurav1086, Ilya Taraban @itaraban, Mesut Meterelliyoz @mmeterel, @nSircombe, Peter Caday @petercad, and Rafik Saliev @rsaliev. We would also like to thank everyone who asked questions and reported issues.

v1.2.2

19 Mar 07:23
Compare
Choose a tag to compare

This is a patch release containing following changes to v1.2.1:

  • Fixed overflow in transposition in bfloat16 weights gradient convolution (0d28389)
  • Added work around corrupted unique_ptr usage in scratchpad (91c89a9)
  • Fixed int8 deconvolution with int32 output on Intel AVX2 systems (ef2d652)
  • Fixed fixed segmentation fault in concat due to incorrect memory alighment #668 (7a0c3a9)
  • Fixed performance regression in no-copy gemm dispatching #525 (89a303b)
  • Fixed segmentation fault in fp32 weights gradient convolution with dilation and large padding (50546ad)
  • Fixed bfloat16/fp32 scalability for eltwise primitive (e281a4a)

v1.3-rc

13 Mar 01:39
Compare
Choose a tag to compare
v1.3-rc Pre-release
Pre-release

This is a release candidate for DNNL v1.3. Please provide feedback and report bugs in Github issues.

v0.21.4

05 Mar 01:45
Compare
Choose a tag to compare

This is a patch release containing following changes to v0.21.3:

  • Fixed large padding handling in input tensor transposition in bfloat16 weights gradient convolution (6df67fe)
  • Fixed performance of reference convolution (2e1d048)
  • Fixed "code is too big" error in case of extreme large spatial size (ed0be61, 4dee389, 59759ba)

v2.0-beta05

06 Apr 23:26
Compare
Choose a tag to compare
v2.0-beta05 Pre-release
Pre-release

This is a preview release for oneDNN v2.0. The release is a patch release based on DNNL v2.0-beta04.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

Known Limitations

  • Weight gradient convolution for bfloat16 datatype with 1d spatial tensor and dilation may produce incorrect result on CPU.
  • Weight gradient convolution for bfloat16 datatype with 2d spatial tensor and dilation may crash on Intel AVX512 systems.
  • Optimized primitives can crash or fail for huge spatial sizes on CPU.
  • dnnl_sgemm, dnnl_gemm_u8s8u32, and inner product functionality does not support sizes exceeding 2^32.
  • Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
  • Intel Processor Graphics Gen11 is not supported.
  • When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

v1.2.1

26 Feb 06:28
Compare
Choose a tag to compare

This is a patch release containing following changes to v1.2:

  • Improved GEMM performance for 1 thread (1fd2bc0)
  • Fixed RNN cell backpropagation computations (4b15a0c)
  • Fixed alpha and beta handling in vanilla RNN cell (70f8b87)
  • Reduced sizes in performance profiling example to avoid memory overflow for systems with less than 2 GB memory (f6e2ef9)
  • Fix correctness for strided convolution with 1x1 filter with non-matching source and destination formats (0405c9a)
  • Removed lambda calls from OpenMP loops as a workaround for Intel C/C++ Compiler 19.1 (a603593)
  • Added -O1 flag for backward convolution gtests as a workaround for Intel C/C++ Compiler 19.1 (495b91f)

v2.0-beta04

12 Feb 03:31
Compare
Choose a tag to compare
v2.0-beta04 Pre-release
Pre-release

This is a preview release for oneDNN v2.0. The release is based on oneDNN v1.2.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

Known Limitations

  • Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
  • Intel Processor Graphics Gen11 is not supported.
  • When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

v1.2

31 Jan 22:57
Compare
Choose a tag to compare

Performance optimizations

  • Improved 1D backward convolution performance on CPU.
  • Improved int8 inference performance on pre-Intel AVX512 systems.
  • Improved int8 inference performance for 3D spatial data on CPU.
  • Improved performance of convolution and other primitives on GPU.

New functionality

  • Introduced general purpose matrix-matrix multiplication primitive. The functionality supports fp32, bfloat16, and int8 data types with asymmetric quantization.
  • Introduced logsoftmax and resampling primitives.
  • Introduced clip and log algorithms support in elementwise primitive.
  • Introduced int8 and bf16 data types support for binary primitive (CPU only).
  • Introduced fully functional support of int8 (inference) and bfloat16 (inference and training) datatypes on GPU. The functionality is not intended for getting performance improvement over f32 on current Intel Integrated Graphics, but to make conformance experiments.

Usability improvements

  • Added JIT code annotations for linux-perf profiler.
  • Added mechanism to control CPU dispatcher behavior at runtime via DNNL_MAX_CPU_ISA environment variable or a function call.
  • Extended DNNL_VERBOSE output with more information about runtimes and devices.

Thanks to the contributors

This release contains contributions from the project core team as well as Aaron Johnson @aaronjohnson, Attila T. Áfra @atafra, Ben Fitch, Ilya Taraban @itaraban, Michał Gallus @Sand3r-, Peter Caday @petercad, Qiyou Chen @chenqy4933 and Jun Luan @junluan. We would also like to thank everyone who asked questions and reported issues.

v0.21.3

18 Jan 02:21
Compare
Choose a tag to compare

This is a patch release containing following changes to v0.21.2:

  • Reduced the upper-bound of memory requirement for gemm-based convolution to reduce the probability of OOM error (cd99749)
  • Significantly reduced the size required for 1x1 convolution (5643445)
  • Added new dummy stream (cba5823)