v0.15
Performance optimizations
- Improved fp32 convolutions performance for real time inference on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support
- Improved int8 depthwise separable convolutions performance on processors with Intel(R) AVX512 instruction set support
- Improved 3D convolution performance on Intel(R) Xeon Phi(TM) processors with AVX512_4FMAPS and AVX512_4VNNIW instruction groups support
- Optimized dilated convolutions for int8 and fp32 data types
- Improved performance of pooling primitives for NHWC and NCHW data layouts
- Improved performance of 3D pooling primitives for plain data layouts
- Optimized batch normalization backpropagation for Intel(R) processors with AVX and SSE4.2 instruction groups support
- Improved performance of batch normalization with 3D spatial data
New functionality
- Feature preview: Introduced training and inference support for GRU cells for recurrent neural network (RNN)
- Introduced general purpose SGEMM API
- Introduced deconvolution (or transposed convolution) primitive for 3D spatial data
- Introduced backward propagation for softmax primitive
Thanks to the contributors
This release contains contributions from many Intel(R) Performance Libraries developers as well as Tuomas Kärnä @tkarna, @msakai, Can Balioglu @cbalioglu, Jacek Czaja @jczaja, Thejan Wijesinghe @ThejanW, Jesse Nicholson @TechnikEmpire, @okdshin, Crissman Loomis @Crissman. We would also like to thank everyone who asked questions and reported issues.
*Other names and brands may be claimed as the property of others.