06 Apr 22:34

anita-intel

1b05a28

v1.4-rc Pre-release

Pre-release

This is a release candidate for DNNL v1.4. Please provide feedback and report bugs in Github issues.

Assets 2

02 Apr 17:39

anita-intel

v1.3

07579e6

v1.3

Performance optimizations

Introduced broad release quality optimizations for future Intel(R) Xeon(R) Scalable processor (code name Cooper Lake).
Improved performance of matmul primitive for 3D tensors (batched matrix-matrix multiplication) on all supported processors.
Improved performance of binary primitive for the case when one of the tensors have to be broadcasted on all supported processors.
Improved performance of convolution primitive for 3D tensors and 1x1 kernel size on all supported processors.

New functionality

Introduced fused depthwise convolution and convolution with 1x1 filter. The implementation is available for all supported processors and data types. The functionality is not implemented for Intel Processor Graphics.
Introduced peephole support for LSTM cell on all supported processors. The functionality is not implemented for Intel Processor Graphics.
Implemented matmul primitive for Intel Processors Graphics.
Extended binary primitive with min and max algorithms support.
Extended eltwise primitive:
- Introduced erf-based implementation of gelu algorithm
- Introduced pow algorithm
- Introduced backpropagation flavor relying on destination tensor as input for elu, exp, logistic, relu, sqrt, and tanh algorithms
Extended set of operations for memory descriptors:
*Added support for changing the number of dimensions with existing dnnl::memory::desc::reshape() method
- Introduced dnnl::memory::desc::permute_axes()) method to change logical axes order

Thanks to the contributors

This release contains contributions from the project core team as well as Araujo Mitrano, Arthur @aaraujom, Aaron Mark Johnson @aaronjohnson, Benjamin Hipple @bhipple, Sergey Nesterov @cepera, @gaurav1086, Ilya Taraban @itaraban, Mesut Meterelliyoz @mmeterel, @nSircombe, Peter Caday @petercad, and Rafik Saliev @rsaliev. We would also like to thank everyone who asked questions and reported issues.

Assets 10

19 Mar 07:23

tprimak

v1.2.2

8e96ef4

v1.2.2

This is a patch release containing following changes to v1.2.1:

Fixed overflow in transposition in bfloat16 weights gradient convolution (0d28389)
Added work around corrupted unique_ptr usage in scratchpad (91c89a9)
Fixed int8 deconvolution with int32 output on Intel AVX2 systems (ef2d652)
Fixed fixed segmentation fault in concat due to incorrect memory alighment #668 (7a0c3a9)
Fixed performance regression in no-copy gemm dispatching #525 (89a303b)
Fixed segmentation fault in fp32 weights gradient convolution with dilation and large padding (50546ad)
Fixed bfloat16/fp32 scalability for eltwise primitive (e281a4a)

Assets 2

13 Mar 01:39

tprimak

v1.3-rc

e4185e4

v1.3-rc Pre-release

Pre-release

This is a release candidate for DNNL v1.3. Please provide feedback and report bugs in Github issues.

Assets 2

05 Mar 01:45

tprimak

v0.21.4

59759ba

v0.21.4

This is a patch release containing following changes to v0.21.3:

Fixed large padding handling in input tensor transposition in bfloat16 weights gradient convolution (6df67fe)
Fixed performance of reference convolution (2e1d048)
Fixed "code is too big" error in case of extreme large spatial size (ed0be61, 4dee389, 59759ba)

Assets 2

06 Apr 23:26

anita-intel

v2.0-beta05

49d9b3a

v2.0-beta05 Pre-release

Pre-release

This is a preview release for oneDNN v2.0. The release is a patch release based on DNNL v2.0-beta04.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

Known Limitations

Weight gradient convolution for bfloat16 datatype with 1d spatial tensor and dilation may produce incorrect result on CPU.
Weight gradient convolution for bfloat16 datatype with 2d spatial tensor and dilation may crash on Intel AVX512 systems.
Optimized primitives can crash or fail for huge spatial sizes on CPU.
dnnl_sgemm, dnnl_gemm_u8s8u32, and inner product functionality does not support sizes exceeding 2^32.
Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
Intel Processor Graphics Gen11 is not supported.
When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

Assets 10

26 Feb 06:28

vpirogov

v1.2.1

95e052a

v1.2.1

This is a patch release containing following changes to v1.2:

Improved GEMM performance for 1 thread (1fd2bc0)
Fixed RNN cell backpropagation computations (4b15a0c)
Fixed alpha and beta handling in vanilla RNN cell (70f8b87)
Reduced sizes in performance profiling example to avoid memory overflow for systems with less than 2 GB memory (f6e2ef9)
Fix correctness for strided convolution with 1x1 filter with non-matching source and destination formats (0405c9a)
Removed lambda calls from OpenMP loops as a workaround for Intel C/C++ Compiler 19.1 (a603593)
Added -O1 flag for backward convolution gtests as a workaround for Intel C/C++ Compiler 19.1 (495b91f)

Assets 2

12 Feb 03:31

anita-intel

v2.0-beta04

53b9208

v2.0-beta04 Pre-release

Pre-release

This is a preview release for oneDNN v2.0. The release is based on oneDNN v1.2.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

Known Limitations

Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
Intel Processor Graphics Gen11 is not supported.
When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

Assets 10

31 Jan 22:57

anita-intel

v1.2

75d0b1a

v1.2

Performance optimizations

Improved 1D backward convolution performance on CPU.
Improved int8 inference performance on pre-Intel AVX512 systems.
Improved int8 inference performance for 3D spatial data on CPU.
Improved performance of convolution and other primitives on GPU.

New functionality

Introduced general purpose matrix-matrix multiplication primitive. The functionality supports fp32, bfloat16, and int8 data types with asymmetric quantization.
Introduced logsoftmax and resampling primitives.
Introduced clip and log algorithms support in elementwise primitive.
Introduced int8 and bf16 data types support for binary primitive (CPU only).
Introduced fully functional support of int8 (inference) and bfloat16 (inference and training) datatypes on GPU. The functionality is not intended for getting performance improvement over f32 on current Intel Integrated Graphics, but to make conformance experiments.

Usability improvements

Added JIT code annotations for linux-perf profiler.
Added mechanism to control CPU dispatcher behavior at runtime via DNNL_MAX_CPU_ISA environment variable or a function call.
Extended DNNL_VERBOSE output with more information about runtimes and devices.

Thanks to the contributors

This release contains contributions from the project core team as well as Aaron Johnson @aaronjohnson, Attila T. Áfra @atafra, Ben Fitch, Ilya Taraban @itaraban, Michał Gallus @Sand3r-, Peter Caday @petercad, Qiyou Chen @chenqy4933 and Jun Luan @junluan. We would also like to thank everyone who asked questions and reported issues.

Assets 10

18 Jan 02:21

tprimak

v0.21.3

4f5d024

v0.21.3

This is a patch release containing following changes to v0.21.2:

Reduced the upper-bound of memory requirement for gemm-based convolution to reduce the probability of OOM error (cd99749)
Significantly reduced the size required for 1x1 convolution (5643445)
Added new dummy stream (cba5823)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimizations

New functionality

Thanks to the contributors

Known Limitations

Known Limitations

Performance optimizations

New functionality

Usability improvements

Thanks to the contributors

Releases: oneapi-src/oneDNN

v1.4-rc

v1.3

Performance optimizations

New functionality

Thanks to the contributors

v1.2.2

v1.3-rc

v0.21.4

v2.0-beta05

Known Limitations

v1.2.1

v2.0-beta04

Known Limitations

v1.2

Performance optimizations

New functionality

Usability improvements

Thanks to the contributors

v0.21.3