From f8a3ed243d7022b44285aae3edb4d67aaa04e74f Mon Sep 17 00:00:00 2001 From: Denis Samoilov Date: Wed, 8 Mar 2023 17:08:16 -0800 Subject: [PATCH] gpu: nvidia: doc: update README.md --- src/gpu/nvidia/README.md | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/src/gpu/nvidia/README.md b/src/gpu/nvidia/README.md index fffba665074..503a3f33fb6 100644 --- a/src/gpu/nvidia/README.md +++ b/src/gpu/nvidia/README.md @@ -86,7 +86,7 @@ normalization. normalization, which is used as an input to the activation function, is saved in the workspace as well. This is required to compute the backward pass for `dnnl_fuse_norm_relu` flag. -* Forward pass supports f32, f16 and s8 data types. Although blocking is not +* Forward pass supports f32, f16, bf16 and s8 data types. Although blocking is not supported for s8. #### Backward direction @@ -109,6 +109,7 @@ normalization. intermediate result of the batch normalization saved in the forward pass. This is used to compute the backward direction of the activation function used for `RELU`. +* Backward pass supports `f32` and `bf16` data types. ### Binary @@ -117,6 +118,7 @@ The `cudnnOpTensor` is equivalent of oneDNN binary primitives. * Only scales attribute is supported. Post-op attribute is not supported. * Blocking is only supported for `int8` and only in the C dimension with either 4 or 32 block size (same as other cuDNN primitives). +* Supported data types are f32, f16, bf16 and s8. ### Concat @@ -180,9 +182,9 @@ limitations when using Nvidia backend for eltwise primitive: * cuDNN expects `x`, `y` and `dy` as inputs to the backward pass, hence, only `RELU` operation supports backward proragation kind. TODO: add `ELU_DST`, `TANH_DST` and `LOGISTIC_DST` support which require `dy`. -* Forward pass supports `f32`, `f16` and `s8` data types. Although blocking is +* Forward pass supports `f32`, `f16`, `bf16` and `s8` data types. Although blocking is not supported for `s8`. -* Backward pass supports `f32` and `f16` data types. +* Backward pass supports `f32` and `bf16` data types. ### Inner product @@ -200,6 +202,8 @@ falls back to the convolution backend. `cudnnActivationForward` operation is used for eltwise operation and `cudnnAddTensor` is used for bias operation. The `beta` parameter in gemm is used for the sum scale and `alpha` parameter is used for the output scale. +* Forward pass supports `f32`, `f16`, `bf16` and `s8` data types. +* Backward pass supports `f32`and `bf16` data types. #### Using convolution @@ -223,6 +227,8 @@ product has the following restrictions and performance implications: convolution restriction. * For `int8` cuDNN requires both input and output feature maps to be a multiple of 4. +* Forward pass supports `f32`, `f16`, `bf16` and `s8` data types. +* Backward pass supports `f32` and `bf16` data types. ### LRN @@ -244,6 +250,7 @@ The matrix multiplication primitive in the Nvidia backend is implemented with * Zero points support is not provided by cuBLAS and, hence, not supported by the Nvidia backend. * Post-ops and output scale limitations are same as for Inner Product. +* Supported data types are `f32`, `f16`, `bf16` and `s8`. ### Pooling @@ -267,6 +274,8 @@ backward propagation respectively. workspace is always required when the Nvidia backend is used (except for the forward inference). +* Supported data type are `f32`, `f16`, `bf16` and `s8`. + ### Reorder The `cudnnTransform` function is the equivalent of oneDNN reorder function. @@ -279,6 +288,8 @@ GPU: currently supports block size of 4. * Blocking is only supported when channel dimension is a multiple of the block size and the datatype is `int8`. +* Forward pass supports `f32`, `f16`, `bf16` and `s8` data types. +* Backward pass supports `f32` and `bf16` data types. ### Resampling @@ -317,6 +328,8 @@ changed to `CUDNN_SOFTMAX_LOG`. * There is a bug in cuDNN softmax for 5D tensor with format `NHWC`. When the channel size is greater than 1, it only applies softmax for a single channel and leave the others untouched. +* Forward pass supports `f32`, `f16`, `bf16` and `s8` data types. +* Backward pass supports `f32` and `bf16` data types. ### Sum