v3.6.1
This is a patch release containing the following changes to v3.6:
- Fixed convolution correctness issue in some scenarios involving persistent cache on Intel GPUs (e595e59)
- Fixed potential page faults in reduction primitive implementation for Intel GPUs (7740c75, a4fcef9, 32d8660)
- Implemented a workaround for GCC 13 bug that resulted in matmul hangs on some Intel Arc graphics SKUs (a30d526)
- Updated execution units (EU) number detection logic for Intel GPUs based on Xe2 architecture to accommodate for behavior changes in Linux driver (04e7eac, 97b04bd)
- Fixed build issue for static library with ONEDNN_VERBOSE=OFF (7f476cb)
- Fixed correctness issue in SYCL deconvolution implementation with post-ops (8f600a3)
- Fixed memory formats checks in SYCL softmax implementation (6ae73e4)
- Fixed correctness issue in SYCL resampling implementation with post-ops (9845057)
- Aligned accessor types in SYCL kernels with SYCL specification (0d9b3bd)
- Improved scales argument checks in generic SYCL kernels (9f73bf1, 7d85c75)
- Fixed correctness issue in int8 convolution with sum post-op on NVIDIA GPUs (7486ed8)
- Relaxed accuracy test threshold for bf16 softmax on NVIDIA GPUs (e9d0fdb)
- Added support for bf16 and fp16 bias for fp8 matmul on Intel CPUs (188ae7f)
- Fixed a bug that prevented dispatching Intel AVX-512 with Intel DL Boost implementation in int8 RNN primitive (bf58e72)
- Fixed a runtime fail with
CL_OUT_OF_RESOURCES
error in fp16 convolution on Intel Arc graphics (39a5f67, 7e1663f)