Skip to content

v0.15

Compare
Choose a tag to compare
@tprimak tprimak released this 11 Jul 07:39

Performance optimizations

  • Improved fp32 convolutions performance for real time inference on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support
  • Improved int8 depthwise separable convolutions performance on processors with Intel(R) AVX512 instruction set support
  • Improved 3D convolution performance on Intel(R) Xeon Phi(TM) processors with AVX512_4FMAPS and AVX512_4VNNIW instruction groups support
  • Optimized dilated convolutions for int8 and fp32 data types
  • Improved performance of pooling primitives for NHWC and NCHW data layouts
  • Improved performance of 3D pooling primitives for plain data layouts
  • Optimized batch normalization backpropagation for Intel(R) processors with AVX and SSE4.2 instruction groups support
  • Improved performance of batch normalization with 3D spatial data

New functionality

  • Feature preview: Introduced training and inference support for GRU cells for recurrent neural network (RNN)
  • Introduced general purpose SGEMM API
  • Introduced deconvolution (or transposed convolution) primitive for 3D spatial data
  • Introduced backward propagation for softmax primitive

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Tuomas Kärnä @tkarna, @msakai, Can Balioglu @cbalioglu, Jacek Czaja @jczaja, Thejan Wijesinghe @ThejanW, Jesse Nicholson @TechnikEmpire, @okdshin, Crissman Loomis @Crissman. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.