This is a patch release containing the following changes to v3.6.1:
- Fixed segmentation fault issue in convolution primitive on processors with Intel AVX2 instruction set support (2eb3dd1)
- Added a workaround for build issue with GCC 8.2 and GNU binutils 2.27 (19ef223, 262fb02, e3782e8)
- Fixed a thread safety issue in matmul primitive for builds relying on Arm Compute Library (ACL) and bumped minimal supported ACL version to 24.11.1 (4d962e7)
- Suppressed spurious warnings for GCC (7d3164d, c805a50, e526172, dc780cb)
- Fixed segfaults in BRGEMM-based matmul, convolution, and deconvolution implementations on AArch64-based processors (a873a1c, 9a1dc92)
- Fixed performance regression in
bf16
convolution with ACL on AArch64-based processors (4793296) - Fixed an issue with convolution primitive creation with
PREFER_YMM
CPU ISA hint on AArch64-based processors (e34d992) - Improved
bf16
matmul performance with fp32 destination with ACL on AArch64-based processors (548d5d6) - Improved
bf16
tofp32
reorder performance on AArch64-based Processors (917dd13) - Fixed issue in matmul primitive with 4D tensors on AArch64-based processors (d13c966)
- Suppressed spurious GCC warnings in deconvolution primitive on AArch64-based processors (f90f60e)
- Fixed warnings in BRGEMM implementation on AArch64-based processors (866b196)
- Fixed correctness issue in reorder primitive with zero points for 4D shapes on AArch64-based Processors (836ea10)
- Improved
bf16
reorder performance on AArch64-based Processors (12bafbe) - Fixed performance regression for backward convolution primitive descriptor creation time on Intel processors (2b3389f)
- Improved performance of
fp16
matmul withint4
weights on Intel GPUs based on Xe2 architecture (4c8fb2c, 3dd4f43, 280bd28) - Fixed performance regression for int8 convolution with large spatial sizes on processors with Intel AMX support (05d68df)
- Restricted check for microkernel fusion support to cases when fusion functionality is actually used on Intel GPUs (48f6bd9)