From 715b4e9605733ccade0376e933e7a241a820e300 Mon Sep 17 00:00:00 2001 From: "Lv, Tao A" Date: Mon, 9 Dec 2024 11:09:58 +0800 Subject: [PATCH] doc: graph: patterns: fix words and remove legacy gc patterns --- doc/graph/fusion_patterns/fusion_patterns.md | 63 ++------------------ 1 file changed, 5 insertions(+), 58 deletions(-) diff --git a/doc/graph/fusion_patterns/fusion_patterns.md b/doc/graph/fusion_patterns/fusion_patterns.md index 7ae0ae38e34..b39374e1571 100644 --- a/doc/graph/fusion_patterns/fusion_patterns.md +++ b/doc/graph/fusion_patterns/fusion_patterns.md @@ -4,13 +4,13 @@ Fusion Patterns {#dev_guide_graph_fusion_patterns} ## Overview The following fusion patterns are subgraphs that the oneDNN Graph API recognizes -as candidate for fusion. The patterns are described using oneDNN Graph +as candidates for fusion. The patterns are described using oneDNN Graph operation (op) names with the following convention. @note oneDNN Graph performs limited input validation to minimize the performance overheads. The application is responsible for sanitizing inputs passed to the -library. For large u8 or s8 inputs may lead to accumulator overflow, you can use -floating point patterns instead of quantized patterns. +library. Because large `u8` or `s8` inputs may lead to accumulator overflow, you +can use floating-point patterns instead of quantized patterns. `"+"` describes a chain of two ops. The preceding op produces an output tensor, which is consumed by the following op as its first operand. @@ -35,7 +35,7 @@ the producer and consumer relation within one graph partition. For example, A\f$_{>t1}\f$+B+C\f$_{"` refers to the output -tensor, and `"<"` for input tensor. Input and output tensor between neighbor +tensor, and `"<"` for input tensor. Input and output tensors between neighbor ops are not explicitly marked, for example, B consumes t1 implicitly in the example above. @@ -55,7 +55,7 @@ inputs. In the example A\f$_{>t1}\f$+B+C\f$_{out}\f$ | N/A | | ReLUBackward + BatchNormTrainingBackward\f$_{>out}\f$ |N/A | - -All the above fusion patterns are supported by default. - -## Aggressive Fusion Patterns -Aggressive fusion patterns also follow the pattern description convention -defined in the [Fusion Patterns](@ref fusion_patterns) section. - -@note Aggressive fusion patterns are only supported when -[Graph Compiler](@ref dev_guide_graph_compiler) is enabled. - -The following categories will also be used to describe aggressive fusion -patterns. - -- ReshapeTranspose = [StaticReshape + StaticTranspose\f$^{1-2}\f$] - -- Activation = [ReLU \| Sigmoid \| GELU] - -- ActivationBackward = [ReLUBackward \| SigmoidBackward \| GELUBackward] - -### Inference - -#### Floating Point Patterns - -| Pattern | Description | -|:--------|:-----------------------------| -| MatMul + [Multiply \| Divide] + Add + Softmax + MatMul + StaticTranspose + Reorder\f$_{>out}\f$ | Multi-head Attention. This pattern is widely used in models containing encoder-decoder structures, for example BERT. | -| ReshapeTranspose\f$_{>t1}\f$, ReshapeTranspose\f$_{>t2}\f$, ReshapeTranspose + MatMul\f$_{out}\f$ | Multi-head Attention. | -| MatMul + Activation\f$_{>t1}\f$, [MatMul\f$_{t1}\f$]\f$^{0-4}\f$, MatMul\f$_{out}\f$ | Multi-layer Perceptron. This pattern is widely used in recommendation models, for example DLRM. | -| [Convolution + BiasAdd\f$^{?}\f$ + ReLU]\f$^{1-3}\f$ + Convolution + BiasAdd\f$^{?}\f$ + Add + ReLU\f$_{>out}\f$ | Identical Bottleneck. Enabled only in single thread runtime scenario. This pattern is widely used in Convolution Neural Networks, for example ResNet. | -| Convolution + BiasAdd\f$^{?}\f$\f$_{>t1}\f$, [Convolution + BiasAdd\f$^{?}\f$ + ReLU]\f$^{1-3}\f$ + Convolution + BiasAdd\f$^{?}\f$ + Add\f$_{out}\f$ | Convolutional Bottleneck. Enabled only in single thread runtime scenario. This pattern is widely used in Convolution Neural Networks, for example ResNet. | - -#### Quantized Patterns - -| Pattern | Description | -|:--------|:-----------------------------| -| Dequantize\f$_{>t1}\f$, Dequantize\f$_{>t2}\f$, Dequantize + MatMul\f$_{out}\f$ | Quantized Multi-head Attention. | -| Dequantize + ReshapeTranspose\f$_{>t1}\f$, Dequantize + ReshapeTranspose\f$_{>t2}\f$, Dequantize + MatMul\f$_{out}\f$ | Quantized Multi-head Attention. | -| Dequantize\f$_{>t1}\f$, Dequantize + MatMul\f$_{t2}\f$, [Dequantize\f$_{>t3}\f$, Dequantize\f$_{t2}\f$]\f$^{0-4}\f$, Dequantize\f$_{>t4}\f$, Dequantize\f$_{out}\f$ | Quantized Multi-layer Perceptron. | -| Dequantize\f$_{>t2}\f$, Dequantize\f$_{>t3}\f$, [Dequantize\f$_{>t1}\f$, Dequantize + Convolution\f$_{out}\f$ | Quantized Identical Bottleneck. Enabled only in single thread runtime scenario. | -| [Dequantize\f$_{>t1}\f$, Dequantize + Convolution\f$_{t2}\f$, Dequantize\f$_{>t4}\f$, [Dequantize\f$_{>t3}\f$, Dequantize + Convolution\f$_{out}\f$ | Quantized Convolutional Bottleneck. Enabled only in single thread runtime scenario. | - -### Training - -| Pattern | Description | -|:--------|:-----------------------------| -| Dequantize\f$_{>t1}\f$, Dequantize\f$_{>t2}\f$, Dequantize + MatMul\f$_{out}\f$ | Multi-head Attention Training Forward Pattern. | -| StaticReshape + StaticTranspose\f$_{>t1}\f$ + MatMul + Multiply\f$_{>t2}\f$ + Subtract\f$_{t4}\f$ + MatMul\f$_{>out1}\f$, Multiply\f$_{t3}\f$, MatMul\f$_{out2}\f$, MatMul\f$_{out3}\f$ | Multi-head Attention Training Backward Pattern. | -| MatMul\f$_{>out1}\f$ + Activation\f$_{>t1,>out2}\f$, [MatMul\f$_{out3}\f$ + Activation\f$_{>t1,>out4}\f$]\f$^{0-4}\f$, MatMul\f$_{out5}\f$ + Activation\f$_{>out6}\f$ | Multi-layer Perceptron Training Forward Pattern. | -| StaticTranspose\f$^{?}\f$\f$_{>t0}\f$, ActivationBackward\f$_{>t2}\f$ + MatMul\f$_{t1}\f$, ReduceSum\f$^{?}\f$\f$_{out1}\f$, StaticTranspose\f$^{?}\f$ + MatMul\f$_{out2}\f$, [StaticTranspose\f$^{?}\f$\f$_{>t3}\f$, ActivationBackward\f$_{>t4,t1}\f$, ReduceSum\f$^{?}\f$\f$_{out3}\f$, StaticTranspose\f$^{?}\f$ + MatMul\f$_{out4}\f$]\f$^{0-4}\f$, StaticTranspose\f$^{?}\f$\f$_{>t5}\f$, ActivationBackward\f$_{>t6,out5}\f$, ReduceSum\f$^{?}\f$\f$_{out6}\f$, StaticTranspose\f$^{?}\f$ + MatMul\f$_{out7}\f$ | Multi-layer Perceptron Training Backward Pattern. | -| Convolution\f$_{>out1}\f$ + BatchNormForwardTraining\f$_{>out2}\f$ + ReLU\f$_{>out3}\f$ + Convolution\f$_{>out4}\f$ + BatchNormForwardTraining\f$_{>out5}\f$ + ReLU\f$_{>out6}\f$ + Convolution\f$_{>out7}\f$ + BatchNormForwardTraining\f$_{>out8}\f$ + Add + ReLU\f$_{>out9}\f$ | Identical Bottleneck Training Forward Pattern. | -| Convolution\f$_{>out1}\f$ + BatchNormForwardTraining\f$_{>t1,>out2}\f$, Convolution\f$_{>out3}\f$ + BatchNormForwardTraining\f$_{>out4}\f$ + ReLU\f$_{>out5}\f$ + Convolution\f$_{>out6}\f$ + BatchNormForwardTraining\f$_{>out7}\f$ + ReLU\f$_{>out8}\f$ + Convolution\f$_{>out9}\f$ + BatchNormForwardTraining\f$_{>out10}\f$ + Add\f$_{out11}\f$ | Convolutional Bottleneck Training Forward Pattern. | -| ReLUBackward\f$_{>t1}\f$ + BatchNormTrainingBackward\f$_{>t2,>out1}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_{>t3,>out2}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_{>t4,>out3}\f$ + ConvolutionBackwardData + Add\f$_{out4}\f$, ConvolutionBackwardWeights\f$_{out5}\f$, ConvolutionBackwardWeights\f$_{out6}\f$, ConvolutionBackwardWeights\f$_{out7}\f$ | Identical Bottleneck Training Backward Pattern. | -| ReLUBackward\f$_{>t1}\f$ + BatchNormTrainingBackward\f$_{>t2,>out1}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_{>t3,>out2}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_{>t4,>out3}\f$ + ConvolutionBackwardData + Add\f$_{out4}\f$, BatchNormTrainingBackward\f$_{t5,>out5}\f$ + ConvolutionBackwardData\f$_{>t6}\f$, ConvolutionBackwardWeights\f$_{out6}\f$, ConvolutionBackwardWeights\f$_{out7}\f$, ConvolutionBackwardWeights\f$_{out8}\f$, ConvolutionBackwardWeights\f$_{out9}\f$ | Convolutional Bottleneck Training Backward Pattern. |