Releases: ARM-software/ComputeLibrary
v24.11.1 Public Minor Release
- Add stateless GEMM execution via ICPPKernel::run_op
- TensorShape class supports dynamic shapes
- Add skeletons for Dynamic GEMM operator
- Convert Double rounding to Single rounding quantization behaviour in both Cpu/Gpu backend
- Detect Advanced SIMD support on Windows®
- Implement activation heuristics for Neoverse™ V1
- Optimize PReLU on quantized datatypes
Documentation (API, build guide, contribution guide, errata, etc.) available here:
v24.11 Public Major Release
- Add SVE SoftmaxLayer kernel for BF16
- Provide stateless API for CpuGemmLowpMatrixMultiplyCore, CpuQuantize, and DequantizationLayer
- Extend static quantization interface for both matmul and convolution operations
- Clarify Third-Party IP licenses
- Check if CpuGemmAssemblyDispatch is configured in CpuMatMul before continue
- Add BF16 support for CpuGemmAssemblyDispatchWrapper
- Detect SVE support on Windows® to run the available kernels
- Fixed missing cstdint include which occurs with GCC 15
- Disable -O2 when building for Windows® as this crashes when certain compiler versions are used
- Make cast on CPU truncate float to int instead of round to be consistent with other ML frameworks
- Return error in validate() for CpuGemmLowpMatrixMultiplyCore if pretransposed A or B are true as this is not supported
- Avoid implicit conversion from __fp16 to arm_compute::bfloat16 to avoid illegal instructions in hardware with FP16 but no BF16 support
- Softmax SME2 kernel selection now correctly detects if SME2 is supported
- Requantization rounding issues in CPU/GPU Quantize
- Scale normalising coefficient in GPU LogSoftmax
- Apply consistent rounding policy in NEReduceMean
- Revert default memory manager for NEQLSTMLayer
- Create default memory manager when none is provided
- Turn duplicated code in the elementwise_binary kernel into templates to reduce code size
- Move CpuSoftmaxKernel LUT to LUTManager to consolidate location of all LUTs
- Use SME instead of SVE for subtractions in SoftmaxLayer for Q8 relating to LUT address calculation
Documentation (API, build guide, contribution guide, errata, etc.) available here:
v24.09 Public Major Release
Provide a wrapper class to expose cpu::CpuSoftmaxGeneric
Detect number of cores in Windows®
Add Optimized SME kernel for QASYMM8_SIGNED elementwise addition operation
LogSoftmax Int8/UInt8 mismatches in Cpu
Rounding of negative integers in pooling 2d/3d gpu kernels
OpenMP® linker error on Windows®
Rounding of negative integers in pooling 2d/3d kernels
Patches linker failure for cpu::CpuSoftmaxGeneric in partial builds
Cpu/Gpu Reverse data type support
QSYMM16 broadcasted subtraction failures
CpuMulKernel validation when there is x-broadcasting for some types
Data type validation in depthwise op in Cpu
Update macOS® build instructions
Validation tests compute reference and target on each iteration
Reset permuted input and weights on configure in NEDepthwiseConvolutionLayer
Selectively enable CL job chaining
Generate only one shared library when building with CMake
Add BF16 LUT for Softmax Layer with tests
Move heuristic logic of activation kernel into separate class
Removed unused CommandBuffer.
Allocate Persistent and Prepare tensors at start of prepare()
Use mws in OMPScheduler for better thread throttling
Enable FP16 winograd in CpuConv2d for v8a multi_isa builds.
Documentation (API, build guide, contribution guide, errata, etc.) available here:
v24.08.1 Public Patch Release
- Change inheritance qualifiers of experimental Cpu operator interface classes to public for cpu-wrappers.
- Mismatches in static quantization updated after configure tests
- CpuSoftmax configure ignores is_log on validation
- Linker errors in armv8.2a Windows® builds
Documentation (API, build guide, contribution guide, errata, etc.) available here:
v24.08 Public Major Release
- Expose CpuAdd functionality using the experimental operators api
- Expose CpuDepthwiseConv2d functionality using the experimental operators api
- Expose CpuElementwiseDivision functionality using the experimental operators api
- Expose CpuElementwiseMax functionality using the experimental operators api
- Expose CpuElementwiseMin functionality using the experimental operators api
- Expose CpuGemmAssemblyDispatch functionality using the experimental operators low-level api
- Expose CpuMul functionality using the experimental operators api
- Expose CpuSub functionality using the experimental operators api
- Solve performance issue on Arm® Mali™-G78
- Illegal intruction in multi_isa armv8a
- Set num_threads in ThreadInfo correctly in OMPScheduler
- Fix Alexnet graph example giving incorrect results
Documentation (API, build guide, contribution guide, errata, etc.) available here:
Public major release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here:
Public minor release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here:
Public major release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here:
Public major release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here:
Public patch release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here: