Performance optimizations

Added optimizations for future Intel(R) Xeon(R) processors with AVX512_VNNI instruction groups support. New instructions are used in direct convolutions with int8 and int16 data types.
Improved performance of int8 direct forward convolution on Intel Xeon processors with Intel AVX512 instruction set.
Improved performance of grouped convolutions and depthwise separable convolutions.

New functionality

Extended Batch Normalization to enable fused ReLU on forward and backward propagation.

Usability improvements

Improved profiling and debugging capabilities:
- New verbose mode reports detailed information about each Intel MKL-DNN primitive call including primitive name, data layout, implementation and execution time.
- Instrumentation and tracing technology (ITT) enables profiling of JIT code with Intel(R) Vtune(TM) Amplifier XE.
- JIT kernels can now be saved for inspection.
Extended documentation with details on int8 quantization, inference workflow, and fusion.
Added int8 inference example.

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Patric Zhao @pengzhao-intel, Ashok Emani @ashokei, Erik Kruus @kruus and Dmitriy Gorokhov. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.13

Performance optimizations

New functionality

Usability improvements

Thanks to the contributors