Skip to content

Releases: ARM-software/CMSIS-NN

v7.0.0

28 Nov 11:08
22080c6
Compare
Choose a tag to compare

The following are the updates compared to previous release in CMSIS-NN v6.0.0

New operators and features

  • New int8 Pad operator
  • New int8 Transpose operator
  • New int8 Minimum/Maximum s8 operator
  • New int8 /int16 Batch Matmul operator
  • Per channel quantized support for Fully connected operator

Optimizations

  • Improved performance and reduced memory usage for Transposed convolution.
  • MVE conv int4: interleave im2col
  • Align kernel_sum/ effective_bias useage
  • Change SVDF MVE memmove to faster arm_memcpy_s8
  • Add optional restrict keyword to conv core loop out pointers
  • Treat DW conv with 1 input ch as a regular conv op
  • Fast 1x1 Conv DSP case use unordered im2col

API changes

  • arm_convolve_s8 New argument upscale_dims to support new transposed conv implementation. May be set to NULL with no behavioural change.
  • arm_transpose_conv_s8_get_buffer_size New argument transposed_conv_params and behaviour to support new transposed conv implementation.
  • arm_vector_sum_s8 New argument rhs_offset to support pre-computation of kernel_sum and bias

Full Changelog: v6.0.0...v7.0.0

v6.0.0

03 Jun 07:08
6982301
Compare
Choose a tag to compare

Release Notes

The following are the updates compared to previous release in CMSIS-NN v5.0.0

API Changes

  • These are non backward compatible API change, hence the release has a major version update. Please refer to arm_nnfunctions.h for more details.
    • Int32 bias support for int16x8 convolution - arm_convolve_wrapper_s16/arm_convolve_s16 parameters updated
    • Int16 input convolution support for MVEI - removed arm_convolve_fast_s16
    • LSTM reimplemention - most LSTM API functions replaced or updated
    • API function arm_convolve_1_x_n_s8_get_buffer_size parameters updated

Performance Improvements

  • Performance improvements for int4 DW convolution
  • MVE Conv improvments by avoiding unaligned access
  • LSTM reimplementation - overall improvements
  • MVE Conv 1xN conv using im2col

New Features

  • MVEI packed int4 kernel support in FC, convolution and DW convolution
  • LSTM reimplemented to align with TFLM reference kernel.
  • LSTM support for int16 input
  • DSP/MVEI support for Transpose convolution
  • Support for grouped convolutions
  • Non zero filter offset support for FC
  • Int16 input convolution support for MVEI
  • Int32 bias support for int16x8 convolution

General Improvements

  • Unit tests refactoring started

Full Changelog: v5.0.0...v6.0.0

v5.0.0

22 Nov 12:39
bfc54ed
Compare
Choose a tag to compare

Release Notes

The following are the updates compared to previous release in CMSIS-NN v4.1.0

API Changes

  • Improved read efficiency in FC for MVE extension.
    • This is non backward compatible API change, hence the release has a major version update.
    • The new api changes are arm_vector_sum_s8, arm_svdf_s8 and arm_svdf_s8_get_buffer_size_mve. Please refer to arm_nnfunctions.h for details.

Performance Improvements

  • Improved read efficiency in FC for MVE extension.
    • This also means FC and SVDF calculate kernel sums in prepare phase before actual inference, and because of this there may be an increase in memory usage for certain models.

New Features

  • Packed int4 kernel support in FC, convolution and DW convolution for scalar version and DSP extension.
  • Scalar/base support for new operator Transpose convolution.

General Improvements

  • Extended unit test coverage.

Full Changelog: 23.08...v5.0.0

v4.1.0

17 May 14:05
61d1bb6
Compare
Choose a tag to compare

Release Notes

The following are the updates compared to previous release in CMSIS-NN v4.0.0

Performance Improvements

  • Improvements in LSTM, generic convolution, 1xN convolution, DW convolution and FC for MVE extension.
  • Improvements in LSTM, generic convolution and int8/int16 elementwise mul for DSP extension.

New Features

  • Script to extract model hyperparameters.
  • Get size of buffers on host to support TVM use case.
  • Dependency to CMSIS-Core is removed. CMSIS-NN can be built without including any other CMSIS module.
  • A new DS_CNN_S model unit test is added that is used in End-to-End benchmark AudioMark.

General Improvements

  • Extended unit test coverage.

Bug Fixes

  • Potential out of buffer write in SVDF state data.
  • Fix selection of correct int16 DW Convolution function.
  • Workaround for a GCC 12.2 Internal Compiler Error affecting MVE.
  • Fix error in buffer size calculation of DW Convolution wrapper for int8.
  • Fix 'asm operand has impossible constraint' error for certain combination of GCC compiler related to MVE optimizations.

CMSIS-NN 4.0.0

21 Nov 08:39
Compare
Choose a tag to compare

Release Notes

The following are the updates compared to previous release in CMSIS 5.9.0

Return Type Change

The return type of all API's that returned a status is now changed. CMSIS-NN used error codes from CMSIS-DSP in the form of enum 'arm_status'. This is now replaced by enum 'arm_cmsis_nn_status'. The status values are still the same. It is reccomended that users change the return type in their applications.
Removal of Legacy Functions

Neural Network(NN) operators which do not follow the quantization specification of TensorFlow Lite for Microcontrollers is removed. Existing users can use CMSIS 5.9.0 release to continue using it.
As a consequence of this, the data type aliases q7_t, q15_t, q31_t and q63_t are replaced by int8_t, int16_t, int32_t, int64_t respectively.

New Operators

Scalar implementation of LSTM with unit tests. We plan to add optimizations for DSP extension and Multi Vector Extension(MVE) in the next release.

New Features

These are new optimizations to existing operators.

  • DSP extension optimization for int16 average pooling
  • MVE optimization for int16 max and average pooling
  • MVE optimization for int16 add and mul
  • MVE optimization for int16 fully connected
  • MVE and DSP extension optimization for int16 depthwise convolution
  • MVE and DSP extension optimization for non-unity stride 1x1 convolution

Performance Improvements

  • 3x3 depthwise convolution for DSP extension
  • 1x1 convolution for MVE