v0.13
Performance optimizations
- Added optimizations for future Intel(R) Xeon(R) processors with AVX512_VNNI instruction groups support. New instructions are used in direct convolutions with int8 and int16 data types.
- Improved performance of int8 direct forward convolution on Intel Xeon processors with Intel AVX512 instruction set.
- Improved performance of grouped convolutions and depthwise separable convolutions.
New functionality
- Extended Batch Normalization to enable fused ReLU on forward and backward propagation.
Usability improvements
- Improved profiling and debugging capabilities:
- New verbose mode reports detailed information about each Intel MKL-DNN primitive call including primitive name, data layout, implementation and execution time.
- Instrumentation and tracing technology (ITT) enables profiling of JIT code with Intel(R) Vtune(TM) Amplifier XE.
- JIT kernels can now be saved for inspection.
- Extended documentation with details on int8 quantization, inference workflow, and fusion.
- Added int8 inference example.
Thanks to the contributors
This release contains contributions from many Intel(R) Performance Libraries developers as well as Patric Zhao @pengzhao-intel, Ashok Emani @ashokei, Erik Kruus @kruus and Dmitriy Gorokhov. We would also like to thank everyone who asked questions and reported issues.
*Other names and brands may be claimed as the property of others.