Skip to content

v24.11

Latest
Compare
Choose a tag to compare
@developer-compute developer-compute released this 18 Nov 11:53

v24.11 Public Major Release

Feat

  • Add SVE SoftmaxLayer kernel for BF16
  • Provide stateless API for CpuGemmLowpMatrixMultiplyCore, CpuQuantize, and DequantizationLayer
  • Extend static quantization interface for both matmul and convolution operations

Fix

  • Clarify Third-Party IP licenses
  • Check if CpuGemmAssemblyDispatch is configured in CpuMatMul before continue
  • Add BF16 support for CpuGemmAssemblyDispatchWrapper
  • Detect SVE support on Windows® to run the available kernels
  • Fixed missing cstdint include which occurs with GCC 15
  • Disable -O2 when building for Windows® as this crashes when certain compiler versions are used
  • Make cast on CPU truncate float to int instead of round to be consistent with other ML frameworks
  • Return error in validate() for CpuGemmLowpMatrixMultiplyCore if pretransposed A or B are true as this is not supported
  • Avoid implicit conversion from __fp16 to arm_compute::bfloat16 to avoid illegal instructions in hardware with FP16 but no BF16 support
  • Softmax SME2 kernel selection now correctly detects if SME2 is supported
  • Requantization rounding issues in CPU/GPU Quantize
  • Scale normalising coefficient in GPU LogSoftmax
  • Apply consistent rounding policy in NEReduceMean
  • Revert default memory manager for NEQLSTMLayer
  • Create default memory manager when none is provided

Refactor

  • Turn duplicated code in the elementwise_binary kernel into templates to reduce code size
  • Move CpuSoftmaxKernel LUT to LUTManager to consolidate location of all LUTs

Perf