doc: graph: add operation documents

2025-10-20 18:43:49 +08:00 · 2022-11-18 20:58:44 +08:00
parent 0648a3589f
commit f0a28862d1
96 changed files with 4269 additions and 44 deletions
--- a/cmake/Doxyrest.cmake
+++ b/cmake/Doxyrest.cmake
@ -1,5 +1,5 @@
 #===============================================================================
-# Copyright 2021 Intel Corporation
+# Copyright 2021-2022 Intel Corporation
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@ -45,6 +45,7 @@ if (DOXYREST_FOUND)
            --frame-dir=${DOXYREST_FRAME_DIR}/cfamily
            --config=${CMAKE_CURRENT_BINARY_DIR}/doxyrest-config.lua
        COMMAND ${CMAKE_COMMAND} -E copy_directory ${CMAKE_CURRENT_SOURCE_DIR}/doc/rst ${DOXYREST_OUTPUT_DIR}/rst
+        COMMAND ${CMAKE_COMMAND} -E copy_directory ${CMAKE_CURRENT_SOURCE_DIR}/doc/graph/rst ${DOXYREST_OUTPUT_DIR}/rst
        COMMAND ${CMAKE_COMMAND} -E touch ${DOXYREST_STAMP_FILE}
        WORKING_DIRECTORY ${DOXYREST_OUTPUT_DIR}
        COMMENT "Translating documentation from .xml to .rst with Doxyrest" VERBATIM)
--- a/doc/Doxyfile.in
+++ b/doc/Doxyfile.in
@ -270,6 +270,8 @@ ALIASES += diffdstiterc="\f$\diffdstiterc\f$"
 ALIASES += diffgamma="\f$\diffgamma\f$"
 ALIASES += diffbeta="\f$\diffbeta\f$"
 ALIASES += workspace="\f$\workspace\f$"
+ALIASES += srcshape="\f$\srcshape\f$"
+ALIASES += dstshape="\f$\dstshape\f$"

 # This tag can be used to specify a number of word-keyword mappings (TCL only).
 # A mapping has the form "name=value". For example adding "class=itcl::class"
--- a/doc/build/build_options.md
+++ b/doc/build/build_options.md
@ -281,7 +281,7 @@ $ cmake -DONEDNN_GPU_RUNTIME=OCL -DOPENCLROOT=/path/to/opencl/sdk ..
 ## Graph component limitations

 The graph component can be enabled via the build option `ONEDNN_BUILD_GRAPH`.
-But the build option doesn't work with some values of other build options.
+But the build option does not work with some values of other build options.
 Specifying the options and values simutanously in one build will lead to a CMake
 error.

--- a/doc/performance_considerations/graph_dump.md
+++ b/doc/performance_considerations/graph_dump.md
@ -13,11 +13,11 @@ The graph dumping feature only works when `ONEDNN_BUILD_GRAPH` is ON.

 ## Run-Time Controls

-When the feature is enabled at build time, users can use an environment variable
-`ONEDNN_GRAPH_DUMP` to control the serialization level. This option accepts
-setting flags. These flags can be combined together to make the library dumping
-different files. For example, the below setting will generate files containing
-library graph and subgraphs in each partition.
+When the feature is enabled at build time, the environment variable
+`ONEDNN_GRAPH_DUMP` can be used to control the serialization level. This option
+accepts setting flags. These flags can be combined together to make the library
+dumping different files. For example, the below setting will generate files
+containing library graph and subgraphs in each partition.

 | Variable                  | Flags            | Description
 | :---                      | :---             |:---
--- a/doc/graph/operations/Abs.md
+++ b/doc/graph/operations/Abs.md
@ -0,0 +1,41 @@
+# Abs {#dev_guide_op_abs}
+
+## General
+
+Abs operation performs element-wise the absolute value with given tensor, it
+applies following formula on every element of \src tensor (the variable names
+follow the standard @ref dev_guide_conventions):
+
+\f[ dst = \begin{cases} src & \text{if}\ src \ge 0 \\
+    -src & \text{if}\ src < 0 \end{cases} \f]
+
+## Operation attributes
+
+Abs operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` | Required
+
+## Supported data types
+
+Abs operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+f16 | f16
+bf16 | bf16
--- a/doc/graph/operations/AbsBackward.md
+++ b/doc/graph/operations/AbsBackward.md
@ -0,0 +1,42 @@
+# AbsBackward {#dev_guide_op_absbackward}
+
+## General
+
+AbsBackward operation computes gradient for Abs operation.
+
+\f[ dst = \begin{cases} diff\_dst & \text{if}\ src > 0 \\
+    -diff\_dst & \text{if}\ src < 0 \\
+    0 & \text{if}\ src = 0 \\
+    \end{cases} \f]
+
+## Operation attributes
+
+AbsBackward operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+AbsBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+f16 | f16 | f16
+bf16 | bf16 | bf16
--- a/doc/graph/operations/Add.md
+++ b/doc/graph/operations/Add.md
@ -0,0 +1,50 @@
+# Add {#dev_guide_op_add}
+
+## General
+
+Add operation performs element-wise addition operation with two given tensors
+applying multi-directional broadcast rules.
+
+\f[
+    \dst(\overline{x}) =
+        \src_0(\overline{x}) \mathbin{+} \src_1(\overline{x}),
+\f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[auto_broadcast](@ref dnnl::graph::op::attr::auto_broadcast) | Specifies rules used for auto-broadcasting of src tensors. |string |`none`, `numpy` (default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `src_0`       | Required             |
+| 1     | `src_1`       | Required             |
+
+@note Both src shapes should match and no auto-broadcasting is allowed if
+`auto_broadcast` attributes is `none`. `src_0` and `src_1` shapes can be
+different and auto-broadcasting is allowed if `auto_broadcast` attributes is
+`numpy`.
+
+### Outputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `dst`         | Required             |
+
+## Supported data types
+
+Add operation supports the following data type combinations.
+
+| Source0/1  | Destination |
+| ---- | ------- |
+| f32  | f32     |
+| bf16 | bf16    |
+| f16  | f16     |
--- a/doc/graph/operations/AvgPool.md
+++ b/doc/graph/operations/AvgPool.md
@ -0,0 +1,61 @@
+# AvgPool {#dev_guide_op_avgpool}
+
+## General
+
+AvgPool operation performs the computation following the below formulas.
+Variable names follow the standard @ref dev_guide_conventions.
+
+\f[
+    \dst(n, c, oh, ow) =
+        \frac{1}{DENOM}
+        \sum\limits_{kh, kw}
+            \src(n, c, oh \cdot SH + kh \cdot (DH + 1) - PH_L, ow \cdot SW + kw \cdot (DW + 1) - PW_L)
+\f]
+
+where,
+
+- when attribute `exclude_pad` is set to false, in which case
+  \f$DENOM = KH \cdot KW\f$,
+
+- when attribute `exclude_pad` is set to true, in which case \f$DENOM\f$ equals
+  to the size of overlap between an averaging window and images.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[strides](@ref dnnl::graph::op::attr::strides) | Controls the strides the window is moved. |s64 |A s64 list containing positive values  | Required
+[pads_begin](@ref dnnl::graph::op::attr::pads_begin) | Controls number of zeros to be add to the front/top/left of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`.|s64 | A s64 list containing non-negative values  | Required
+[pads_end](@ref dnnl::graph::op::attr::pads_end) | Controls number of zeros to be add to the back/bottom/right of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`. |s64 |A s64 list containing non-negative values | Required
+[kernel](@ref dnnl::graph::op::attr::kernel) | Size of pooling window. | s64| A s64 list containing positive values | Required
+[exclude_pad](@ref dnnl::graph::op::attr::exclude_pad)| Controls whether the padded values are counted. |bool | True, False| required
+[rounding_type](@ref dnnl::graph::op::attr::rounding_type) | Controls how to do rounding. |string |  `floor` (default), `ceil` | Optional
+[auto_pad](@ref dnnl::graph::op::attr::auto_pad) |Controls how the paddings are calculated.| string | `none` (default), `same_upper`, `same_lower`, `valid` | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` | Required
+
+## Supported data types
+
+AvgPool operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 |f32
+bf16 |bf16
+f16 |f16
--- a/doc/graph/operations/AvgPoolBackward.md
+++ b/doc/graph/operations/AvgPoolBackward.md
@ -0,0 +1,50 @@
+# AvgPoolBackward {#dev_guide_op_avgpoolbackward}
+
+## General
+
+AvgPoolBackward operation accepts \f$\diffdst\f$ tensor and \f$\srcshape\f$
+tensor (optional), and calculates \f$\diffsrc\f$ tensor.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[strides](@ref dnnl::graph::op::attr::strides) | Controls the strides the window is moved. |s64 |A s64 list containing positive values  | Required
+[pads_begin](@ref dnnl::graph::op::attr::pads_begin) | Controls number of zeros to be add to the front/top/left of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`.|s64 | A s64 list containing non-negative values  | Required
+[pads_end](@ref dnnl::graph::op::attr::pads_end) | Controls number of zeros to be add to the back/bottom/right of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`. |s64 |A s64 list containing non-negative values | Required
+[kernel](@ref dnnl::graph::op::attr::kernel) | Size of pooling window. | s64| A s64 list containing positive values | Required
+[exclude_pad](@ref dnnl::graph::op::attr::exclude_pad)| Controls whether the padded values are counted. |bool | True, False| Required
+[auto_pad](@ref dnnl::graph::op::attr::auto_pad) |Controls how the paddings are calculated.| string | `none` (default), `same_upper`, `same_lower`, `valid` | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+[src_shape](@ref dnnl::graph::op::attr::src_shape) |Denotes the shape of input of forward op.| string|`NCX`, `NXC` (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_dst` | Required
+1|`src_shape` | Optional
+
+@note Either `src_shape` input or `src_shape` attribute should be provided. If
+both provided, `src_shape` input will precede over `src_shape` attribute.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_src` | Required
+
+## Supported data types
+
+AvgPoolBackward operation supports the following data type combinations.
+
+Diff_dst |Diff_src|Src_shape
+-- | --|--
+f32 |f32|s64
+bf16 |bf16|s64
+f16 |f16|s64
--- a/doc/graph/operations/BatchNormForwardTraining.md
+++ b/doc/graph/operations/BatchNormForwardTraining.md
@ -0,0 +1,56 @@
+# BatchNormForwardTraining {#dev_guide_op_batchnormforwardtraining}
+
+## General
+
+BatchNormForwardTraining operation performs batch normalization at training mode.
+
+Mean and variance are computed at runtime, the following formulas are used:
+
+- \f$\mu(c) = \frac{1}{NHW} \sum\limits_{nhw} \src(n, c, h, w)_{}\f$,
+
+- \f$\sigma^2(c) = \frac{1}{NHW} \sum\limits_{nhw} {}_{} (\src(n, c, h, w) - \mu(c))^2\f$.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[epsilon](@ref dnnl::graph::op::attr::epsilon) | A number to be added to the variance to avoid division by zero. |f32 |A positive f32 value  | Required
+[momentum](@ref dnnl::graph::op::attr::momentum) | A number to be used to calculate running mean and running variance. |f32 |A positive f32 value  | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+1|`mean` | Required
+2|`variance`|Required
+3|`gamma` | Optional
+4|`beta` (\f$\sigma^2\f$)|Optional
+
+@note `gamma` and `beta` should be either both provided or neither provided.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` | Required
+1|`running_mean` | Required
+2|`running_variance` | Required
+3|`batch_mean` | Required
+4|`batch_variance` | Required
+
+## Supported data types
+
+BatchNormInference operation supports the following data type combinations.
+
+Src / Dst | Gamma / Beta / Mean / Variance / Batch_mean / Batch_variance / Running_mean / Running_variance
+--|--
+f32 | f32
+bf16 | f32, bf16
+f16 | f32
--- a/doc/graph/operations/BatchNormInference.md
+++ b/doc/graph/operations/BatchNormInference.md
@ -0,0 +1,59 @@
+# BatchNormInference {#dev_guide_op_batchnorminference}
+
+## General
+
+The formula is the same as
+[Batch Normalization primitive](@ref dev_guide_batch_normalization) like below.
+
+\f[
+    \dst(n, c, h, w) =
+       \gamma(c) \cdot
+       \frac{\src(n, c, h, w) - \mu(c)} {\sqrt{\sigma^2(c) + \varepsilon}}
+       + \beta(c),
+\f]
+
+where
+
+- \f$\gamma(c), \beta(c)\f$ are required scale and shift for a channel,
+
+- \f$\mu(c), \sigma^2(c)\f$ are mean and variance for a channel, and
+
+- \f$\varepsilon\f$ is a constant to improve numerical stability.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[epsilon](@ref dnnl::graph::op::attr::epsilon) | A number to be added to the variance to avoid division by zero. |f32 |A positive float value  | Required
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+1|`gamma` | Required
+2|`beta`|Required
+3|`mean` | Required
+4|`variance` (\f$\sigma^2\f$)|Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` | Required
+
+## Supported data types
+
+BatchNormInference operation supports the following data type combinations.
+
+Src / Dst | Gamma / Beta / Mean / Variance
+--|--
+f32 | f32
+bf16 | f32, bf16
+f16 | f32
--- a/doc/graph/operations/BatchNormTrainingBackward.md
+++ b/doc/graph/operations/BatchNormTrainingBackward.md
@ -0,0 +1,49 @@
+# BatchNormTrainingBackward {#dev_guide_op_batchnormtrainingbackward}
+
+## General
+
+BatchNormTrainingBackward operation calculated the gradients of input tensors.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[epsilon](@ref dnnl::graph::op::attr::epsilon) | A number to be added to the variance to avoid division by zero. |f32 |A positive f32 value  | Required
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+1|`diff_dst` | Required
+2|`mean`|Required
+3|`variance` | Required
+4|`gamma` (\f$\sigma^2\f$)|Optional
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_src` | Required
+1|`diff_gamma` | Optional
+2|`diff_beta` | Optional
+
+@note `diff_gamma` and `diff_beta` should be either both provided or neither
+provided. If neither provided, the input `gamma` will be ignored.
+
+## Supported data types
+
+BatchNormTrainingBackward operation supports the following data type
+combinations.
+
+Src / Diff_dst / Dst | Mean / Variance / Gamma / Diff_gamma / Diff_beta
+--|--
+f32 | f32
+bf16 | f32, bf16
+f16 | f32
--- a/doc/graph/operations/BiasAdd.md
+++ b/doc/graph/operations/BiasAdd.md
@ -0,0 +1,45 @@
+# BiasAdd {#dev_guide_op_biasadd}
+
+## General
+
+Add bias to channel dimension of input. This is a special `Add` with bias
+restricted to be 1-D. Broadcasting is supported.
+
+  \f[ \dst(n,c,h,w) = \src(n,c,h,w) + \bias(c) \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[data_format](@ref dnnl::graph::op::attr::data_format) | Controls how to interpret the shape of `src` and `dst`. |string |`NCX` , `NXC` (default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `src`         | Required             |
+| 1     | `bias`        | Required             |
+
+@note `bias` is a 1D tensor to be added to `src` tensor. The size should be the
+same as size of channel dimension of `src` tensor.
+
+### Outputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `dst`         | Required             |
+
+## Supported data types
+
+BiasAdd operation supports the following data type combinations.
+
+| Src  |  Bias  | Dst     |
+| ---- | ------ | ------- |
+| f32  | f32    | f32     |
+| bf16 | bf16   | bf16    |
+| f16  | f16    | f16     |
--- a/doc/graph/operations/BiasAddBackward.md
+++ b/doc/graph/operations/BiasAddBackward.md
@ -0,0 +1,40 @@
+# BiasAddBackward {#dev_guide_op_biasaddbackward}
+
+## General
+
+BiasAddBackward operation computes the gradients on the bias tensor for
+BiasAdd operator. This op accumulates all the values from \f$\diffdst\f$ into
+the channel dimension, the axis depends on the layout of \src tensor.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[data_format](@ref dnnl::graph::op::attr::data_format) | Controls how to interpret the shape of `diff_dst` and `diff_bias`. |string |`NCX` , `NXC` (default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_bias` | Required
+
+## Supported data types
+
+BiasAddBackward operation supports the following data type combinations.
+
+diff_dst  | diff_bias
+---- | -------
+f32  | f32
+bf16 | bf16
+f16  | f16
--- a/doc/graph/operations/Clamp.md
+++ b/doc/graph/operations/Clamp.md
@ -0,0 +1,43 @@
+# Clamp {#dev_guide_op_clamp}
+
+## General
+ 
+Clamp operation represents clipping activation function, it applies following 
+formula on every element of \src tensor (the variable names follow the standard 
+@ref dev_guide_conventions):
+
+\f[ clamp(src_i) = min(max(src_i, min\_value), max\_value) \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[min](@ref dnnl::graph::op::attr::min) | The lower bound of values in the output. Any value in the input that is smaller than the bound, is replaced with the `min` value. | f32 | Arbitrary valid f32 value | Required
+[max](@ref dnnl::graph::op::attr::max) | The upper bound of values in the output. Any value in the input that is greater than the bound, is replaced with the `max` value. | f32 | Arbitrary valid f32 value | Required
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` | Required
+
+## Supported data types
+
+Clamp operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+f16 | f16
+bf16 | bf16
--- a/doc/graph/operations/ClampBackward.md
+++ b/doc/graph/operations/ClampBackward.md
@ -0,0 +1,41 @@
+# ClampBackward {#dev_guide_op_clampbackward}
+
+## General
+
+ClampBackward operation computes gradient for Clamp.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[min](@ref dnnl::graph::op::attr::min) | The lower bound of values in the output. Any value in the input that is smaller than the bound, is replaced with the `min` value. | f32 | Arbitrary valid f32 value | Required
+[max](@ref dnnl::graph::op::attr::max) | The upper bound of values in the output. Any value in the input that is greater than the bound, is replaced with the `max` value. | f32 | Arbitrary valid f32 value | Required
+[use_dst](@ref dnnl::graph::op::attr::use_dst) | If true, use `dst` of Clamp operation to calculate the gradient. Otherwise, use `src`. | bool | `true` (default), `false` | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` / `dst` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+ClampBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+f16 | f16 | f16
+bf16 | bf16 | bf16
--- a/doc/graph/operations/Concat.md
+++ b/doc/graph/operations/Concat.md
@ -0,0 +1,51 @@
+# Concat {#dev_guide_op_concat}
+
+## General
+
+Concat operation concatenates \f$N\f$ tensors over `axis` (here designated
+\f$C\f$) and is defined as (the variable names follow the standard
+@ref dev_guide_conventions):
+
+\f[
+    \dst(\overline{ou}, c, \overline{in}) =
+        \src_i(\overline{ou}, c', \overline{in}),
+\f]
+
+where \f$c = C_1 + .. + C_{i-1} {}_{} + c'\f$.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[axis](@ref dnnl::graph::op::attr::axis) | Specifies dimension along which concatenation happens.  |s64 | A s64 value in the range of [-r, r-1] where r = rank(src) | Required
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src_i` | Required
+
+@note At least one input tensor is required. Data types and ranks of all input
+tensors should match. The dimensions of all input tensors should be the same
+except for the dimension specified by `axis` attribute.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` |Required
+
+## Supported data types
+
+Concat operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32  |f32
+bf16  |bf16
+f16  |f16
--- a/doc/graph/operations/ConvTranspose.md
+++ b/doc/graph/operations/ConvTranspose.md
@ -0,0 +1,58 @@
+# ConvTranspose {#dev_guide_op_convtranspose}
+
+## General
+
+ConvTranspose operation performs the same computation as calculating the
+gradient with regard to \src of Convolution operation. To see the difference
+visually, you can go to
+[visualization page](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md).
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[strides](@ref dnnl::graph::op::attr::strides) | Controls the strides the weights tensor is moved when computing convolution. |s64 |A s64 list containing positive values  | Required
+[pads_begin](@ref dnnl::graph::op::attr::pads_begin) | Controls number of zeros to be add to the front/top/left of spatial dimensions.|s64 | A s64 list containing non-negative values  | Required
+[pads_end](@ref dnnl::graph::op::attr::pads_end) | Controls number of zeros to be add to the back/bottom/right of spatial dimensions. |s64 |A s64 list containing non-negative values | Required
+[dilations](@ref dnnl::graph::op::attr::dilations) | Controls the amount of stretching the kernel before convolution ([visualization link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md#dilated-convolution-animations)). | s64| A s64 list containing positive values (>1 means dilated convolution) | Required
+[auto_pad](@ref dnnl::graph::op::attr::auto_pad)| Controls how the padding is calculated.|string | `none` (default), `same_upper`, `same_lower`, `valid` | Optional
+[output_padding](@ref dnnl::graph::op::attr::output_padding)| Adds additional amount of padding per each spatial axis in `dst`.|s64 | A s64 list containing non-negative values, all zeros by default | Optional
+[groups](@ref dnnl::graph::op::attr::groups) | Controls how input channels and output channels are divided into. |s64 |A positive s64 value, `1` by default | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+[weights_format](@ref dnnl::graph::op::attr::weights_format) |Controls how to interpret the shape of `weights`.| string|`IOX`, `XOI` (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+1|`weights` | Required
+2|`bias`|Optional
+
+@note
+The shape of \weights is
+\f$(in\_channels / groups, out\_channels, spatial\_shape)\f$ for `IOX` format or
+\f$(spatial\_shape, out\_channels, in\_channels / groups)\f$ for `XOI` format.
+Both \f$in\_channels\f$ and \f$out\_channels\f$ must be divisible by *groups*
+attribute.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` | Required
+
+## Supported data types
+
+ConvTranspose operation supports the following data type combinations.
+
+Src | Weights | Bias | Dst
+--|--|-- | --
+f32 | f32 | f32 |f32
+bf16 | bf16 | bf16 |bf16
+f16 | f16 | f16 |f16
--- a/doc/graph/operations/ConvTransposeBackwardData.md
+++ b/doc/graph/operations/ConvTransposeBackwardData.md
@ -0,0 +1,56 @@
+# ConvTransposeBackwardData {#dev_guide_op_convtransposebackwarddata}
+
+## General
+
+ConvTransposeBackwardData operation takes \f$\diffdst\f$ and \weights and
+computes \f$\diffsrc\f$.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[strides](@ref dnnl::graph::op::attr::strides) | Controls the strides the weights tensor is moved when computing convolution. |s64 |A s64 list containing positive values  | Required
+[pads_begin](@ref dnnl::graph::op::attr::pads_begin) | Controls number of zeros to be add to the front/top/left of spatial dimensions.|s64 | A s64 list containing non-negative values  | Required
+[pads_end](@ref dnnl::graph::op::attr::pads_end) | Controls number of zeros to be add to the back/bottom/right of spatial dimensions. |s64 |A s64 list containing non-negative values | Required
+[dilations](@ref dnnl::graph::op::attr::dilations) | Controls the amount of stretching the kernel before convolution ([visualization link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md#dilated-convolution-animations)). | s64| A s64 list containing positive values (>1 means dilated convolution) | Required
+[auto_pad](@ref dnnl::graph::op::attr::auto_pad)| Controls how the padding is calculated.|string | `none` (default), `same_upper`, `same_lower`, `valid` | Optional
+[output_padding](@ref dnnl::graph::op::attr::output_padding)| Adds additional amount of padding per each spatial axis in `dst`.|s64 | A s64 list containing non-negative values, all zeros by default | Optional
+[groups](@ref dnnl::graph::op::attr::groups) | Controls how input channels and output channels are divided into. |s64 |A positive s64 value, `1` by default | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+[weights_format](@ref dnnl::graph::op::attr::weights_format) |Controls how to interpret the shape of `weights`.| string|`IOX`, `XOI` (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_dst` | Required
+1|`weights` | Required
+
+@note
+The shape of \weights is
+\f$(in\_channels / groups, out\_channels, spatial\_shape)\f$ for `IOX` format or
+\f$(spatial\_shape, out\_channels, in\_channels / groups)\f$ for `XOI` format.
+Both \f$in\_channels\f$ and \f$out\_channels\f$ must be divisible by *groups*
+attribute.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_src` | Required
+
+## Supported data types
+
+ConvTransposeBackwardData operation supports the following data type
+combinations.
+
+Diff_dst | Weights  | Diff_src
+--|--|-- 
+f32 | f32 |f32
+bf16 | bf16  |bf16
+f16 | f16 | f16 |f16
--- a/doc/graph/operations/ConvTransposeBackwardWeights.md
+++ b/doc/graph/operations/ConvTransposeBackwardWeights.md
@ -0,0 +1,62 @@
+# ConvTransposeBackwardWeights {#dev_guide_op_convtransposebackwardweights}
+
+## General
+
+ConvTransposeBackwardWeights operation takes \f$\diffdst\f$, \src and optional
+\f$weights\_shape\f$ computes \f$\diffweights\f$.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[strides](@ref dnnl::graph::op::attr::strides) | Controls the strides the weights tensor is moved when computing convolution. |s64 |A s64 list containing positive values  | Required
+[pads_begin](@ref dnnl::graph::op::attr::pads_begin) | Controls number of zeros to be add to the front/top/left of spatial dimensions.|s64 | A s64 list containing non-negative values  | Required
+[pads_end](@ref dnnl::graph::op::attr::pads_end) | Controls number of zeros to be add to the back/bottom/right of spatial dimensions. |s64 |A s64 list containing non-negative values | Required
+[dilations](@ref dnnl::graph::op::attr::dilations) | Controls the amount of stretching the kernel before convolution ([visualization link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md#dilated-convolution-animations)). | s64| A s64 list containing positive values (>1 means dilated convolution) | Required
+[auto_pad](@ref dnnl::graph::op::attr::auto_pad)| Controls how the padding is calculated.|string | `none` (default), `same_upper`, `same_lower`, `valid` | Optional
+[output_padding](@ref dnnl::graph::op::attr::output_padding)| Adds additional amount of padding per each spatial axis in `dst`.|s64 | A s64 list containing non-negative values, all zeros by default | Optional
+[groups](@ref dnnl::graph::op::attr::groups) | Controls how input channels and output channels are divided into |s64 |A positive s64 value, `1` by default | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+[weights_format](@ref dnnl::graph::op::attr::weights_format) |Controls how to interpret the shape of `weights`.| string|`IOX`, `XOI` (default) | Optional
+[weights_shape](@ref dnnl::graph::op::attr::weights_shape) |Denotes the shape of the `weights` tensor.| s64| A s64 list containing positive values| Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+1|`diff_dst` | Required
+2|`weights_shape`|Optional
+
+@note
+The shape of \weights is
+\f$(in\_channels / groups, out\_channels, spatial\_shape)\f$ for `IOX` format or
+\f$(spatial\_shape, out\_channels, in\_channels / groups)\f$ for `XOI` format.
+Both \f$in\_channels\f$ and \f$out\_channels\f$ must be divisible by *groups*
+attribute.
+
+@note Either `weights_shape` input or `weights_shape` attribute should be
+provided. If both provided, `weights_shape` input will precede over
+`weights_shape` attribute.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_weights` | Required
+
+## Supported data types
+
+ConvTransposeBackwardWeights operation supports the following data type
+combinations.
+
+Src | Diff_dst | Diff_weights | Weights_shape
+--|--|-- | --
+f32 | f32 | f32 |s32
+bf16 | bf16 | bf16 |s32
+f16 | f16 | f16 |s32
--- a/doc/graph/operations/Convolution.md
+++ b/doc/graph/operations/Convolution.md
@ -0,0 +1,139 @@
+# Convolution {#dev_guide_op_convolution}
+
+## General
+
+Convolution operation performs the convolution between src tensor and weight
+tensor, which is defined as by the following formulas. Variable names follow the
+standard @ref dev_guide_conventions.
+
+Let \src, \weights and \dst tensors have shape \f$N \times IC \times IH \times
+IW\f$, \f$OC \times IC \times KH \times KW\f$, and \f$N \times OC \times OH
+\times OW\f$ respectively.
+
+Furthermore, let the remaining convolution parameters be:
+
+| Parameter | Depth      | Height     | Width      | Comment
+| :--| :--        | :--        | :--        |:--
+| Paddings: Front, top, and left    | \f$PD_L\f$ | \f$PH_L\f$ | \f$PW_L\f$ | In the attributes we use `pads_begin` to indicate the corresponding vector of paddings |
+| Padding: Back, bottom, and right | \f$PD_R\f$ | \f$PH_R\f$ | \f$PW_R\f$ | In the attributes we use `pads_end` to indicate the corresponding vector of paddings  |
+| Stride                               | \f$SD\f$   | \f$SH\f$   | \f$SW\f$   | In the attributes we use `strides` to indicate the corresponding vector of strides |
+| Dilation                             | \f$DD\f$   | \f$DH\f$   | \f$DW\f$   | In the attributes we use `dilations` to indicate the corresponding vector of dilations|
+
+To further simplify the formulas, we assume that the attribute `data_format` and
+`weights_format` are set to `NCX` and `OIX` respectively. `NCX` means the fist
+axis represents batch dimension, the second axis represents channel dimension
+and the rest represents spatial dimensions. `OIX` means the first axis
+represents output channel dimension, the second axis represents input channel
+dimension and the rest represents weights spatial dimensions.
+
+### Regular Convolution
+
+This is the same as the formula in
+[Convolution primitive](@ref dev_guide_convolution).
+
+\f[\dst(n, oc, oh, ow) =  \bias(oc) \\
+    + \sum_{ic=0}^{IC-1}\sum_{kh=0}^{KH-1}\sum_{kw=0}^{KW-1}
+        \src(n, ic, oh \cdot SH + kh - PH_L, ow \cdot SW + kw - PW_L)
+        \cdot
+        \weights(oc, ic, kh, kw).\f]
+
+Here:
+
+- \f$OH = \left\lfloor{\frac{IH - KH + PH_L + PH_R}{SH}} \right\rfloor + 1,\f$
+
+- \f$OW = \left\lfloor{\frac{IW - KW + PW_L + PW_R}{SW}} \right\rfloor + 1.\f$
+
+### Convolution with Groups
+
+The attribute `groups` is set to \f$>1\f$.
+
+\f[
+    \dst(n, g \cdot OC_G + oc_g, oh, ow) =
+        \bias(g \cdot OC_G + oc_g) \\
+        +
+        \sum_{ic_g=0}^{IC_G-1}\sum_{kh=0}^{KH-1}\sum_{kw=0}^{KW-1}
+            \src(n, g \cdot IC_G + ic_g, oh \cdot SH + kh - PH_L,
+                    ow \cdot SW + kw - PW_L)
+            \cdot
+            \weights(g, oc_g, ic_g, kh, kw),
+\f]
+
+where
+
+- \f$IC_G = \frac{IC}{G}\f$,
+
+- \f$OC_G = \frac{OC}{G}\f$, and
+
+- \f$oc_g \in [0, OC_G).\f$
+
+### Convolution with Dilation
+
+The attribute `dilation` contains the element which is \f$>1\f$.
+
+\f[
+    \dst(n, oc, oh, ow) =
+        \bias(oc) \\
+        +
+        \sum_{ic=0}^{IC-1}\sum_{kh=0}^{KH-1}\sum_{kw=0}^{KW-1}
+            \src(n, ic, oh \cdot SH + kh \cdot DH - PH_L,
+                    ow \cdot SW + kw \cdot DW - PW_L)
+            \cdot
+            \weights(oc, ic, kh, kw).
+\f]
+
+Here:
+
+- \f$OH = \left\lfloor{\frac{IH - DKH + PH_L + PH_R}{SH}}
+        \right\rfloor + 1,\f$ where \f$DKH = 1 + (KH - 1) \cdot DH\f$, and
+
+- \f$OW = \left\lfloor{\frac{IW - DKW + PW_L + PW_R}{SW}}
+        \right\rfloor + 1,\f$ where \f$DKW = 1 + (KW - 1) \cdot DW\f$.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[strides](@ref dnnl::graph::op::attr::strides) | Controls the strides the weights tensor is moved when computing convolution |s64 |A s64 list containing positive values  | Required
+[pads_begin](@ref dnnl::graph::op::attr::pads_begin) | Controls number of zeros to be add to the front/top/left of spatial dimensions|s64 | A s64 list containing non-negative values  | Required
+[pads_end](@ref dnnl::graph::op::attr::pads_end) | Controls number of zeros to be add to the back/bottom/right of spatial dimensions |s64 |A s64 list containing non-negative values | Required
+[dilations](@ref dnnl::graph::op::attr::dilations) | Controls the amount of stretching the kernel before convolution ([visualization link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md#dilated-convolution-animations)) | s64| A s64 list containing positive values (>1 means dilated convolution) | Required
+[auto_pad](@ref dnnl::graph::op::attr::auto_pad)| Controls how the padding is calculated|string | `none` (default), `same_upper`, `same_lower`, `valid` | Optional
+[groups](@ref dnnl::graph::op::attr::groups) | Controls how input channels and output channels are divided into |s64 |A positive s64 value, `1` by default | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+[weights_format](@ref dnnl::graph::op::attr::weights_format) |Controls how to interpret the shape of `weights`| string|`OIX`, `XIO` (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+1|`weights` | Required
+2|`bias`|Optional
+
+@note
+The shape of \weights is
+\f$(out\_channels, in\_channels / groups, spatial\_shape)\f$ for `OIX` format or
+\f$(spatial\_shape, in\_channels / groups, out\_channels)\f$ for `XIO` format.
+Both \f$in\_channels\f$ and \f$out\_channels\f$ must be divisible by *groups*
+attribute.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` | Required
+
+## Supported data types
+
+Convolution operation supports the following data type combinations.
+
+Src | Weights | Bias | Dst
+--|--|-- | --
+f32 | f32 | f32 |f32
+bf16 | bf16 | bf16 |bf16
+f16 | f16 | f16 |f16
--- a/doc/graph/operations/ConvolutionBackwardData.md
+++ b/doc/graph/operations/ConvolutionBackwardData.md
@ -0,0 +1,108 @@
+# ConvolutionBackwardData {#dev_guide_op_convolutionbackwarddata}
+
+## General
+
+ConvolutionBackwardData operation accepts \f$\diffdst\f$, \weights and optional
+dst shape as inputs, and compute the \f$\diffsrc\f$.
+
+If `auto_pad` attribute is specified to one of `valid`, `same_upper` and
+`same_lower`, `pads_begin` and `pads_end` attributes will be ignored. The
+paddings will be calculated by following the below formula:
+
+Let the parameters be:
+
+| Parameter | Depth      | Height     | Width      | Comment
+| :--| :--        | :--        | :--        |:--
+| Paddings: Front, top, and left    | \f$PD_L\f$ | \f$PH_L\f$ | \f$PW_L\f$ | In the attributes we use `pads_begin` to indicate the corresponding vector of paddings |
+| Padding: Back, bottom, and right | \f$PD_R\f$ | \f$PH_R\f$ | \f$PW_R\f$ | In the attributes we use `pads_end` to indicate the corresponding vector of paddings  |
+| Stride                               | \f$SD\f$   | \f$SH\f$   | \f$SW\f$   | In the attributes we use `strides` to indicate the corresponding vector of strides |
+| Dilation                             | \f$DD\f$   | \f$DH\f$   | \f$DW\f$   | In the attributes we use `dilations` to indicate the corresponding vector of dilations|
+
+Firstly, \f$total\_padding\f$ is calculated according to \f$src\_shape\f$ and \f$dst\_shape\f$.
+Let \f$src\_h\f$ be height dimension of \f$src\_shape\f$ and \f$dst\_h\f$ be
+height dimension of \f$dst\_shape\f$.
+
+\f[
+    total\_padding_h = SH \times (src\_h - 1) + ((KH -1 ) \times DH + 1) - dst\_h + output\_padding_h
+\f]
+
+If `auto_pad` attribute is specified as `valid`:
+
+\f[
+  PD_L = 0 \\
+  PD_R = 0
+\f]
+
+If `auto_pad` attribute is specified as `same_lower`:
+
+\f[
+  PD_L = floor(total\_padding / 2) \\
+  PD_R = total\_padding - PD_L
+\f]
+
+If `auto_pad` attribute is specified as `same_upper`:
+
+\f[
+  PD_L = total\_padding - PD_R \\
+  PD_R = floor(total\_padding / 2)
+\f]
+
+where:
+
+- \f$dst\_shape\f$ is either an attribute or an input tensor,
+
+- \f$output\_padding\f$ is an optional attribute.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[strides](@ref dnnl::graph::op::attr::strides) | Controls the strides the weights tensor is moved when computing convolution. |s64 |A s64 list containing positive values  | Required
+[pads_begin](@ref dnnl::graph::op::attr::pads_begin) | Controls number of zeros to be add to the front/top/left of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`.|s64 | A s64 list containing non-negative values  | Required
+[pads_end](@ref dnnl::graph::op::attr::pads_end) | Controls number of zeros to be add to the back/bottom/right of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`. |s64 |A s64 list containing non-negative values | Required
+[dilations](@ref dnnl::graph::op::attr::dilations) | Controls the amount of stretching the kernel before convolution ([visualization link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md#dilated-convolution-animations)). | s64| A s64 list containing positive values (>1 means dilated convolution) | Required
+[auto_pad](@ref dnnl::graph::op::attr::auto_pad)| Controls how the padding is calculated.|string | `none` (default), `same_upper`, `same_lower`, `valid` | Optional
+[output_padding](@ref dnnl::graph::op::attr::output_padding)| Adds additional amount of padding per each spatial axis in `dst`.|s64 | A s64 list containing non-negative values, all zeros by default | Optional
+[groups](@ref dnnl::graph::op::attr::groups) | Controls how input channels and output channels are divided into. |s64 |A positive s64 value, `1` by default | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+[weights_format](@ref dnnl::graph::op::attr::weights_format) |Controls how to interpret the shape of `weights`.| string|`OIX`, `XIO` (default) | Optional
+[dst_shape](@ref dnnl::graph::op::attr::dst_shape) |Denotes the shape of the `dst` tensor.| s64| A s64 list containing positive values| Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_dst` | Required
+1|`weights` | Required
+2|`dst_shape`|Optional
+
+@note
+The shape of \weights is
+\f$(out\_channels, in\_channels / groups, spatial\_shape)\f$ for `OIX` format or
+\f$(spatial\_shape, in\_channels / groups, out\_channels)\f$ for `XIO` format.
+Both \f$in\_channels\f$ and \f$out\_channels\f$ must be divisible by *groups*
+attribute.
+
+@note Either `dst_shape` input or `dst_shape` attribute should be provided. If
+both provided, `dst_shape` input will precede over `dst_shape` attribute.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_src` | Required
+
+## Supported data types
+
+ConvolutionBackwardData operation supports the following data type combinations.
+
+Diff_dst | Weights | Diff_src | Dst_shape
+--|--|-- | --
+f32 | f32 | f32 |s32
+bf16 | bf16 | bf16 |s32
+f16 | f16 | f16 |s32
--- a/doc/graph/operations/ConvolutionBackwardWeights.md
+++ b/doc/graph/operations/ConvolutionBackwardWeights.md
@ -0,0 +1,61 @@
+# ConvolutionBackwardWeights {#dev_guide_op_convolutionbackwardweights}
+
+## General
+
+ConvolutionBackwardWeights operation accepts \src, \f$\diffdst\f$ and optional
+weights shape as inputs, and compute the \f$\diffweights\f$.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[strides](@ref dnnl::graph::op::attr::strides) | Controls the strides the weights tensor is moved when computing convolution. |s64 |A s64 list containing positive values  | Required
+[pads_begin](@ref dnnl::graph::op::attr::pads_begin) | Controls number of zeros to be add to the front/top/left of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`.|s64 | A s64 list containing non-negative values  | Required
+[pads_end](@ref dnnl::graph::op::attr::pads_end) | Controls number of zeros to be add to the back/bottom/right of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`. |s64 |A s64 list containing non-negative values | Required
+[dilations](@ref dnnl::graph::op::attr::dilations) | Controls the amount of stretching the kernel before convolution ([visualization link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md#dilated-convolution-animations)). | s64| A s64 list containing positive values (>1 means dilated convolution) | Required
+[auto_pad](@ref dnnl::graph::op::attr::auto_pad)| Controls how the padding is calculated.|string | `none` (default), `same_upper`, `same_lower`, `valid` | Optional
+[groups](@ref dnnl::graph::op::attr::groups) | Controls how input channels and output channels are divided into. |s64 |A positive s64 value, `1` by default | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+[weights_format](@ref dnnl::graph::op::attr::weights_format) |Controls how to interpret the shape of `weights`.| string|`OIX`, `XIO` (default) | Optional
+[weights_shape](@ref dnnl::graph::op::attr::weights_shape) |Denotes the shape of the `weights` tensor.| s64| A s64 list containing positive values| Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+1|`diff_dst` | Required
+2|`weights_shape`|Optional
+
+@note
+The shape of \weights is
+\f$(out\_channels, in\_channels / groups, spatial\_shape)\f$ for `OIX` format or
+\f$(spatial\_shape, in\_channels / groups, out\_channels)\f$ for `XIO` format.
+Both \f$in\_channels\f$ and \f$out\_channels\f$ must be divisible by *groups*
+attribute.
+
+**Note** Either `weights_shape` input or `weights_shape` attribute should be
+provided. If both provided, `weights_shape` input will precede over
+`weights_shape` attribute.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_weights` | Required
+
+## Supported data types
+
+ConvolutionBackwardWeights operation supports the following data type
+combinations.
+
+Src | Diff_dst | Diff_weights | Weights_shape
+--|--|-- | --
+f32 | f32 | f32 |s32
+bf16 | bf16 | bf16 |s32
+f16 | f16 | f16 |s32
--- a/doc/graph/operations/Dequantize.md
+++ b/doc/graph/operations/Dequantize.md
@ -0,0 +1,54 @@
+# Dequantize {#dev_guide_op_dequantize}
+
+## General
+
+Dequantize operation converts a quantized (u8 or s8) tensor to a f32 tensor. It
+supports both per-tensor and per-channel asymmetric linear de-quantization.
+Rounding mode is library-implementation defined.
+
+For per-tensor de-quantization:
+
+  \f[ \dst_{i} = round((\src_{i} - zps) \times scale) \f]
+
+For per-channel de-quantization, taking channel axis = 1 as an example:
+
+   \f[ dst_{\cdots,i,\cdots,\cdots} = (\src_{\cdots,i,\cdots,\cdots} - zps_i) \times scale_i, i \in {[0, ic-1]} \f]
+
+where \f$ic\f$ is the number of channels.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[qtype](@ref dnnl::graph::op::attr::qtype) | Specifies which de-quantization type is used. |string | `per_tensor` (default), `per_channel` | Optional
+[axis](@ref dnnl::graph::op::attr::axis) | Specifies dimension on which per-channel de-quantization is applied. |s64 | A s64 value in the range of [-r, r-1] where r = rank(src), `1` by default | Optional
+[scales](@ref dnnl::graph::op::attr::scales) | Scalings applied on the src data. |f32 | A f32 list (only contain one element if qtype is `per_tensor`) | Required
+[zps](@ref dnnl::graph::op::attr::zps) | Offset values that maps to float zero. |s64 | A s64 list (only contain one element if qtype is `per_tensor`) | Required
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` |Required
+
+## Supported data types
+
+Dequantize operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+s8, u8  |f32
+
+@note This operation is to support
+[int8 quantization](@ref dev_guide_graph_int8_quantization_model) model.
--- a/doc/graph/operations/Divide.md
+++ b/doc/graph/operations/Divide.md
@ -0,0 +1,50 @@
+# Divide {#dev_guide_op_divide}
+
+## General
+
+Divide operation performs element-wise division operation with two given tensors
+applying multi-directional broadcast rules.
+
+\f[
+    \dst(\overline{x}) =
+        \src_0(\overline{x}) \mathbin{/} \src_1(\overline{x}),
+\f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[auto_broadcast](@ref dnnl::graph::op::attr::auto_broadcast) | Specifies rules used for auto-broadcasting of src tensors. |string |`none`,`numpy` (default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `src_0`       | Required
+1     | `src_1`       | Required
+
+@note Both src shapes should match and no auto-broadcasting is allowed if
+`auto_broadcast` attributes is `none`. `src_0` and `src_1` shapes can be
+different and auto-broadcasting is allowed if `auto_broadcast` attributes is
+`numpy`. Broadcasting is performed according to auto_broadcast value.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `dst`         | Required
+
+## Supported data types
+
+Divide operation supports the following data type combinations.
+
+Diff_dst  | Diff_bias
+---- | -------  
+f32  | f32
+bf16 | bf16
+f16  | f16
--- a/doc/graph/operations/DynamicDequantize.md
+++ b/doc/graph/operations/DynamicDequantize.md
@ -0,0 +1,62 @@
+# DynamicDequantize {#dev_guide_op_dynamicdequantize}
+
+## General
+
+DynamicDequantize operation converts a quantized (s8 or u8) tensor to a f32
+tensor. It supports both per-tensor and per-channel asymmetric linear
+de-quantization. Rounding mode is library-implementation defined. Unlike the
+@ref dev_guide_op_dequantize, DynamicDequantize takes scales and zero-points as
+operator src tensors.
+
+For per-tensor de-quantization
+
+  \f[ dst = (src - zps)*scales \f]
+
+For per-channel de-quantization, taking channel axis = 1 as an example:
+  \f[ {dst}_{\cdots,i,\cdots,\cdots} = (src_{\cdots,i,\cdots,\cdots} - zps_i)*scales_i,i\in [0,channelNum-1] \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[qtype](@ref dnnl::graph::op::attr::qtype) | Specifies which de-quantization type is used. |string | `per_tensor` (default), `per_channel` | Optional
+[axis](@ref dnnl::graph::op::attr::axis) | Specifies dimension on which per-channel de-quantization is applied. |s64 | A s64 value in the range of [-r, r-1] where r = rank(src), `1` by default. Negative value means counting the dimension backwards from the end.  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `src`         | Required
+1     | `scales`      | Required
+2     | `zps`         | Optional
+
+@note `scales` is a f32 1D tensor to be applied to the de-quantization formula.
+For `qtype` = `per-tensor`, there should be only one element in the scales
+tensor. For `qtype` = `per-channel`, the element number should be equal to the
+element number of src tensor along the dimension axis.
+
+@note `zps` is a 1D tensor with offset values that map to zero. For `qtype` =
+`per-tensor`, there should be only one element in the zps tensor. For `qtype` =
+`per-channel`, the element number should be equal to the element number of input
+tensor along the dimension axis. If not specified, the library can assume the
+operator is symmetric de-quantization and perform kernel optimization accordingly.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `dst`         | Required
+
+## Supported data types
+
+DynamicDequantize operation supports the following data type combinations.
+
+Src  | Dst| Scales |Zps
+---- | -------   | ---|--
+s8   | f32     | f32  |s8, u8, s32
+u8   | f32     | f32  |s8, u8, s32
--- a/doc/graph/operations/DynamicQuantize.md
+++ b/doc/graph/operations/DynamicQuantize.md
@ -0,0 +1,62 @@
+# DynamicQuantize {#dev_guide_op_dynamicquantize}
+
+## General
+
+DynamicQuantize operation converts a f32 tensor to a quantized (s8 or u8)
+tensor. It supports both per-tensor and per-channel asymmetric linear
+quantization. The target quantized data type is specified via the data type of
+dst logical tensor. Rounding mode is library-implementation defined.
+
+For per-tensor quantization
+
+  \f[ dst = round(src/scales + zps) \f]
+
+For per-channel quantization, taking channel axis = 1 as an example:
+  \f[ {dst}_{\cdots,i,\cdots,\cdots} =
+  round(src_{\cdots,i,\cdots,\cdots}/scales_i + zps_i),i\in [0,channelNum-1] \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[qtype](@ref dnnl::graph::op::attr::qtype) | Specifies which quantization type is used. |string | `per_tensor` (default), `per_channel` | Optional
+[axis](@ref dnnl::graph::op::attr::axis) | Specifies dimension on which per-channel quantization is applied. |s64 | A s64 value in the range of [-r, r-1] where r = rank(src), `1` by default. Negative value means counting the dimension backwards from the end. | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `src`         | Required
+1     | `scales`      | Required
+2     | `zps`         | Optional
+
+@note `scales` is a f32 1D tensor to be applied to the quantization formula. For
+`qtype` = `per-tensor`, there should be only one element in the scales tensor.
+For `qtype` = `per-channel`, the element number should be equal to the element
+number of src tensor along the dimension axis.
+
+@note `zps` is a 1D tensor with offset values that map to zero. For `qtype` = `per-tensor`, there should be only one
+element in the zps tensor. For `qtype` = `per-channel`, the element number should be
+equal to the element number of input tensor along the dimension axis. If not
+specified, the library can assume the operator is symmetric quantization and
+perform kernel optimization accordingly.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `dst`         | Required
+
+## Supported data types
+
+DynamicQuantize operation supports the following data type combinations.
+
+Src |Scales  | Zps       | Dst
+----        | -------   | ---|--
+f32 |f32         | s8, u8, s32 | s8
+f32 |f32        | s8, u8, s32 | u8
--- a/doc/graph/operations/Elu.md
+++ b/doc/graph/operations/Elu.md
@ -0,0 +1,42 @@
+# Elu {#dev_guide_op_elu}
+
+## General
+
+Elu operation applies following formula on every element of \src tensor (the
+variable names follow the standard @ref dev_guide_conventions):
+
+\f[ dst = \begin{cases} \alpha(e^{src} - 1) & \text{if}\ src < 0 \\
+    src & \text{if}\ src \ge 0 \end{cases} \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[alpha](@ref dnnl::graph::op::attr::alpha) | Scale for the negative factor. | f32 | Arbitrary non-negative f32 value | Required
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` |Required
+
+## Supported data types
+
+Elu operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/EluBackward.md
+++ b/doc/graph/operations/EluBackward.md
@ -0,0 +1,40 @@
+# EluBackward {#dev_guide_op_elubackward}
+
+## General
+
+EluBackward operation computes gradient for Elu operation.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[alpha](@ref dnnl::graph::op::attr::alpha) | Scale for the negative factor. | f32 | Arbitrary non-negative f32 value | Required
+[use_dst](@ref dnnl::graph::op::attr::use_dst) | If true, use `diff_src` of Elu operation to calculate the gradient. Otherwise, use `src`. | bool | `true` (default), `false` | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+EluBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+f16 | f16 | f16
+bf16 | bf16 | bf16
--- a/doc/graph/operations/End.md
+++ b/doc/graph/operations/End.md
@ -0,0 +1,35 @@
+# End {#dev_guide_op_end}
+
+## General
+
+End operation is used to help construct graph, for example tracking the uses of
+a tensor.
+
+## Operation attributes
+
+End operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `src`         | Required             |
+
+### Outputs
+
+End operation does not support output tensor.
+
+## Supported data types
+
+End operation supports the following data type combinations.
+
+| Src  | Destination |
+| ---- | ------- |
+| f32  | f32     |
+| bf16 | bf16    |
+| f16  | f16     |
--- a/doc/graph/operations/Exp.md
+++ b/doc/graph/operations/Exp.md
@ -0,0 +1,39 @@
+# Exp {#dev_guide_op_exp}
+
+## General
+
+Exp operation is an exponential element-wise activation function, it applies 
+following formula on every element of \src tensor (the variable names follow 
+the standard @ref dev_guide_conventions):
+
+\f[  dst = e^{src} \f]
+## Operation attributes
+
+Exp operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` | Required
+
+## Supported data types
+
+Exp operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+f16 | f16
+bf16 | bf16
--- a/doc/graph/operations/GELU.md
+++ b/doc/graph/operations/GELU.md
@ -0,0 +1,38 @@
+# GELU {#dev_guide_op_gelu}
+
+## General
+
+GELU operation applies following formula on every element of \src tensor (the
+variable names follow the standard @ref dev_guide_conventions):
+\f[ dst = 0.5 * src * (1.0 + erf(src) / \sqrt2) \f]
+
+## Operation attributes
+
+GELU operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` |Required
+
+## Supported data types
+
+GELU operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/GELUBackward.md
+++ b/doc/graph/operations/GELUBackward.md
@ -0,0 +1,37 @@
+# GELUBackward {#dev_guide_op_gelubackward}
+
+## General
+
+GELUBackward operation computes gradient for GELU.
+
+## Operation attributes
+
+GELUBackward operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+GELUBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+f16 | f16 | f16
+bf16 | bf16 | bf16
--- a/doc/graph/operations/HardSwish.md
+++ b/doc/graph/operations/HardSwish.md
@ -0,0 +1,39 @@
+# HardSwish {#dev_guide_op_hardswish}
+
+## General
+
+HardSwish operation applies following formula on every element of \src tensor 
+(the variable names follow the standard @ref dev_guide_conventions):
+
+\f[ dst = src * \frac{\min(\max(src + 3, 0), 6)}{6} \f]
+
+## Operation attributes
+
+HardSwish operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` |Required
+
+## Supported data types
+
+HardSwish operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/HardSwishBackward.md
+++ b/doc/graph/operations/HardSwishBackward.md
@ -0,0 +1,37 @@
+# HardSwishBackward {#dev_guide_op_hardswishbackward}
+
+## General
+
+HardSwishBackward operation computes gradient for HardSwish.
+
+## Operation attributes
+
+HardSwishBackward operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+HardSwishBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+f16 | f16 | f16
+bf16 | bf16 | bf16
--- a/doc/graph/operations/Interpolate.md
+++ b/doc/graph/operations/Interpolate.md
@ -0,0 +1,64 @@
+# Interpolate {#dev_guide_op_interpolate}
+
+## General
+
+Interpolate layer performs interpolation on \src tensor at spatial dimensions.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| -- |--
+[mode](@ref dnnl::graph::op::attr::mode) | Specifies type of interpolation. |string |`nearest`, `linear`, `bilinear`, `trilinear`  | Required
+[coordinate_transformation_mode](@ref dnnl::graph::op::attr::coordinate_transformation_mode) | Specifies how to transform the coordinate in the resized tensor to the coordinate in the original tensor. |string | `half_pixel`(default), `align_corners` | Optional
+[sizes](@ref dnnl::graph::op::attr::sizes) | Specifies dst shape for spatial axes. |s64 |A s64 list containing positive values, `none` is default | Optional
+[scales](@ref dnnl::graph::op::attr::scales) | Specifies `scales` for spatial axes. | f32 | A f32 list, `none` is default | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) | Controls how to interpret the shape of `src` and `dst`. |string | `NCX`, `NXC` (default) | Optional
+
+@note Either `sizes` or `scales` should be provided. When `sizes` is
+used, `scales` will be ignored.
+
+@note
+    The attribute `coordinate_transformation_mode` is the name of transformation
+    mode in string format.\n
+    Here `scale[x]` is `dst_shape[x]/src_shape[x]` and `x_resized` is a
+    coordinate in axis `x`,for any axis `x` from the src axis.\n
+    For `half_pixel`: the coordinate in the original tensor axis `x` is
+    calculated as `((x_resized + 0.5) / scale[x]) - 0.5`.\n
+    For `align_corners`: the coordinate in the original tensor axis `x` is
+    calculated as 0 if `dst_shape[x] == 1` else  `x_resized * (src_shape[x] - 1)
+    / (dst_shape[x] - 1)`.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+1|`sizes` | Optional
+
+@note `sizes` is a 1D tensor describing output shape for spatial axes. It is a
+non-differentiable tensor.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` | Required
+
+@note The shape of the dst matches src
+shape except spatial dimensions. For spatial dimensions, they should match sizes
+from sizes or calculated from `scales` attribute.
+
+## Supported data types
+
+Interpolate operation supports the following data type combinations.
+
+Src/Dst | Sizes
+--   |--
+f32  | s32
+bf16 | s32
+f16  | s3
--- a/doc/graph/operations/InterpolateBackward.md
+++ b/doc/graph/operations/InterpolateBackward.md
@ -0,0 +1,65 @@
+# InterpolateBackward {#dev_guide_op_interpolatebackward}
+
+## General
+
+InterpolateBackward computes the gradients of Interpolate operation.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[mode](@ref dnnl::graph::op::attr::mode) | Specifies type of interpolation |string. |`nearest`, `linear`, `bilinear`, `trilinear`  | Required
+[coordinate_transformation_mode](@ref dnnl::graph::op::attr::coordinate_transformation_mode) | Specifies how to transform the coordinate in the resized tensor to the coordinate in the original tensor|string. | `half_pixel`(default),`align_corners`  | Optional
+[sizes](@ref dnnl::graph::op::attr::sizes) | Specifies dst shape for spatial axes. |s64 |A s64 list containing positive values,`none` is default | Optional
+[scales](@ref dnnl::graph::op::attr::scales) | Specifies `scales` for spatial axes. | f32| A f32 list,`none` is default | Optional
+[data_format](@ref dnnl::graph::op::attr::data_format)| Controls how to interpret the shape of `src` and `dst`.|string | `NCX`, `NXC` (default) | Optional
+
+@note Either `sizes` or `scales` should be provided. When `sizes` is
+used, `scales` will be ignored.
+
+@note 
+    The attribute `coordinate_transformation_mode` is the name of transformation
+    mode in string format.\n
+    Here `scale[x]` is `dst_shape[x]/src_shape[x]` and `x_resized` is a
+    coordinate in axis `x`,for any axis `x` from the src axis.\n
+    For `half_pixel`: the coordinate in the original tensor axis `x` is
+    calculated as `((x_resized + 0.5) / scale[x]) - 0.5`.\n
+    For `align_corners`: the coordinate in the original tensor axis `x` is
+    calculated as 0 if `dst_shape[x] == 1` else  `x_resized * (src_shape[x] - 1)
+    / (dst_shape[x] - 1)`.\n
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src`         | Required
+1|`diff_dst`    | Required
+2|`sizes`       | Optional
+
+@note 
+    `src` is original input tensor of Interpolate op.\n
+    `diff_dst` is the gradient tensor with respect to the dst.\n
+    `sizes` is a 1D tensor describing output shape for spatial axes. 
+
+### Outputs
+
+Index| Argument Name | Required or Optional
+-- | --        | --
+0  |`diff_src` | Required
+
+@note `diff_src` is the gradient tensor with respect to the src of Interpolate.
+
+## Supported data types
+
+InterpolateBackward operation supports the following data type combinations.
+
+Src/Diff_dst/Diff_src | Sizes
+--   |--
+f32  | s32
+bf16 | s32
+f16  | s32
--- a/doc/graph/operations/LayerNorm.md
+++ b/doc/graph/operations/LayerNorm.md
@ -0,0 +1,78 @@
+# LayerNorm {#dev_guide_op_layernorm}
+
+## General
+
+LayerNorm performs a layer normalization operation on \src tensor.
+
+The layerNorm operation performs normalization from `begin_norm_axis` to last
+dimension of the data tensor. It is defined by the following formulas which is
+the same as @ref dev_guide_layer_normalization.
+
+\f[
+    \dst(t, n, c) =
+       \gamma(c) \cdot
+       \frac{\src(t, n, c) - \mu(t, n)} {\sqrt{\sigma^2(t, n) + \epsilon}}
+       + \beta(c),
+\f]
+
+where
+
+- \f$\gamma(c), \beta(c)\f$ are optional scale and shift for a channel
+
+- \f$\mu(t, n), \sigma^2(t, n)\f$ are mean and variance (see
+
+- \f$\epsilon\f$ is a constant to improve numerical stability.
+
+Mean and variance are computed at runtime or provided by a user. When mean and
+variance are computed at runtime, the following formulas are used:
+
+- \f$\mu(t, n) = \frac{1}{C} \sum\limits_{c} \src(t, n, c)_{}\f$,
+
+- \f$\sigma^2(t, n) = \frac{1}{C} \sum\limits_{c} {}_{} (\src(t, n, c) - \mu(t, n))^2\f$.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[keep_stats](@ref dnnl::graph::op::attr::keep_stats) | Indicate whether to output mean and variance which can be later passed to backward op. |bool |`false`,`true` (default)  | Optional
+[begin_norm_axis](@ref dnnl::graph::op::attr::begin_norm_axis) | `begin_norm_axis` is used to indicate which axis to start layer normalization. The normalization is from `begin_norm_axis` to last dimension. Negative values means indexing from right to left. This op normalizes over the last dimension by default, e.g. C in TNC for 3D and LDNC for 4D. |s64 |[-r,r-1],where r=rank(src). -1 is default  | Optional
+[use_affine](@ref dnnl::graph::op::attr::use_affine) | When set to True, this module has learnable per-element affine parameters. |bool |`false`, `true` (default) | Optional
+[epsilon](@ref dnnl::graph::op::attr::epsilon) | The constant to improve numerical stability. |f32 |Arbitrary positive f32 value, `1e-5`(default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `src`         | Required
+1     | `gamma`       | Optional
+2     | `beta`        | Optional
+
+@note `gamma` is scaling for normalized value. `beta` is the bias added to
+the scaled normalized value. They are both 1D tensor with the same span as src’s
+channel axis and required if attribute `use_affine` is set to True.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `dst`         | Required
+1     | `mean`        | Optional
+2     | `variance`    | Optional
+
+@note Both `mean` and `variance` are required if attribute `keep_stats` is set to
+True.
+
+## Supported data types
+
+LayerNorm operation supports the following data type combinations.
+
+Src / Dst | Gamma / Beta / Mean / Variance
+--        |--
+f32       | f32
+bf16      | f32, bf16
+f16       | f32
--- a/doc/graph/operations/LayerNormBackward.md
+++ b/doc/graph/operations/LayerNormBackward.md
@ -0,0 +1,61 @@
+# LayerNormBackward {#dev_guide_op_layernormbackward}
+
+## General
+
+LayerNormBackward performs the backward of LayerNorm operation.
+
+The backward propagation computes
+\f$\diffsrc(t, n, c)\f$,
+\f$\diffgamma(c)^*\f$, and \f$\diffbeta(c)^*\f$
+based on
+\f$\diffdst(t, n, c)\f$, \f$src(t, n, c)\f$, \f$\mu(t, n)\f$,
+\f$\sigma^2(t, n)\f$, \f$\gamma(c) ^*\f$, and \f$\beta(c) ^*\f$.
+
+The tensors marked with an asterisk are used only when the operation is
+configured to use \f$\gamma(c)\f$, and \f$\beta(c)\f$
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[begin_norm_axis](@ref dnnl::graph::op::attr::begin_norm_axis) | `begin_norm_axis` is used to indicate which axis to start layer normalization. The normalization is from `begin_norm_axis` to last dimension. Negative values means indexing from right to left. This op normalizes over the last dimension by default, e.g. C in TNC for 3D and LDNC for 4D. |s64 |[-r,r-1],where r=rank(src). -1 is default  | Optional
+[use_affine](@ref dnnl::graph::op::attr::use_affine) | When set to True, this module has learnable per-element affine parameters. |bool |`false`,`true` (default) | Optional
+[epsilon](@ref dnnl::graph::op::attr::epsilon) | The constant to improve numerical stability. |f32 |Arbitrary positive f32 value, 1e-5 (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `src`         | Required
+1     | `diff_dst`    | Required
+2     | `mean`        | Required
+3     | `variance`    | Required
+4     | `gamma`       | Optional
+5     | `beta`        | Optional
+
+@note `gamma` is scaling for normalized value. `beta` is the bias added to
+the scaled normalized value. They are both 1D tensor with the same span as src’s channel
+axis and required if attribute `use_affine` is set to True.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `diff_src`    | Required
+1     | `diff_gamma`  | Optional
+2     | `diff_beta`   | Optional
+
+## Supported data types
+
+LayerNormBackward operation supports the following data type combinations.
+
+Src / Diff_dst / Diff_src | Gamma / Beta / Mean / Variance / Diff_gamma / Diff_beta
+--        |--
+f32       | f32
+bf16      | f32, bf16
+f16       | f32
--- a/doc/graph/operations/LeakyReLU.md
+++ b/doc/graph/operations/LeakyReLU.md
@ -0,0 +1,50 @@
+# LeakyReLU {#dev_guide_op_leakyrelu}
+
+## General
+
+LeakyReLU operation is a type of activation function based on ReLU. It has a
+small slope for negative values with which LeakyReLU can produce small,
+non-zero, and constant gradients with respect to the negative values. The slope
+is also called the coefficient of leakage.
+
+Unlike @ref dev_guide_op_prelu, the coefficient \f$\alpha\f$ is constant and
+defined before training.
+
+LeakyReLU operation applies following formula on every element of \src tensor
+(the variable names follow the standard @ref dev_guide_conventions):
+
+\f[ dst = \begin{cases} src & \text{if}\ src \ge 0 \\
+    \alpha src & \text{if}\ src < 0 \end{cases} \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[alpha](@ref dnnl::graph::op::attr::alpha) | Alpha is the coefficient of leakage. | f32 | Arbitrary f32 value but usually a small positive value. | Required
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` |Required
+
+## Supported data types
+
+LeakyReLU operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/Log.md
+++ b/doc/graph/operations/Log.md
@ -0,0 +1,40 @@
+# Log {#dev_guide_op_log}
+
+## General
+
+Log operation performs element-wise natural logarithm operation with given 
+tensor, it applies following formula on every element of \src tensor (the 
+variable names follow the standard @ref dev_guide_conventions):
+
+\f[ dst = \log(src) \f]
+
+## Operation attributes
+
+Log operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` | Required
+
+## Supported data types
+
+Log operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+f16 | f16
+bf16 | bf16
--- a/doc/graph/operations/LogSoftmax.md
+++ b/doc/graph/operations/LogSoftmax.md
@ -0,0 +1,40 @@
+# LogSoftmax {#dev_guide_op_logsoftmax}
+
+## General
+
+LogSoftmax operation applies the \f$ \log(softmax(src)) \f$ function to an 
+n-dimensional input Tensor. The formulation can be simplified as:
+\f[ dst_i = \log\Big( \frac{exp(src_i)}{\sum_{j}^{ } exp(src_j)} \Big) \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[axis](@ref dnnl::graph::op::attr::axis) | Represents the axis of which the LogSoftmax is calculated. Negative value means counting dimensions from the back. | s64 | Arbitrary s64 value (`-1` in default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` |Required
+
+## Supported data types
+
+LogSoftmax operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/LogSoftmaxBackward.md
+++ b/doc/graph/operations/LogSoftmaxBackward.md
@ -0,0 +1,39 @@
+# LogSoftmaxBackward {#dev_guide_op_logsoftmaxbackward}
+
+## General
+
+LogSoftmaxBackward operation computes gradient for LogSoftmax.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[axis](@ref dnnl::graph::op::attr::axis) | Represents the axis of which the LogSoftmax is calculated. Negative value means counting dimensions from the back. | s64 | Arbitrary s64 value (`-1` in default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_dst` | Required
+1 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+LogSoftmaxBackward operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/MatMul.md
+++ b/doc/graph/operations/MatMul.md
@ -0,0 +1,67 @@
+# MatMul {#dev_guide_op_matmul}
+
+## General
+
+MatMul operation computes the product of two tensors with optional bias addition
+The variable names follow the standard @ref dev_guide_conventions, typically
+taking 2D input tensors as an example, the formula is below:
+
+\f[
+    \dst(m, n) =
+        \sum_{k=0}^{K - 1} \left(
+            \src(m, k) \cdot \weights(k, n)
+        \right) +
+        \bias(m, n)
+\f]
+
+In the shape of a tensor, two right-most axes are interpreted as row and column
+dimensions of a matrix while all left-most axes (if present) are interpreted as
+batch dimensions. The operation supports broadcasting semantics for those batch
+dimensions. For example \src can be broadcasted to \weights if the corresponding
+dimension in \src is `1` (and vice versa). Additionally, if ranks of \src and
+\weights are different, the tensor with a smaller rank will be *unsqueezed* from
+the left side of dimensions (inserting `1`) to make sure two ranks matched.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[transpose_a](@ref dnnl::graph::op::attr::transpose_a) | Controls whether to transpose the last two dimensions of `src`. |bool | True, False (default) | Optional
+[transpose_b](@ref dnnl::graph::op::attr::transpose_b) | Controls whether to transpose the last two dimensions of `weights`. |bool | True, False (default) | Optional
+
+The above transpose attribute will not in effect when rank of an input tensor is
+less than 2. For example, in library implementation 1D tensor is unsqueezed
+firstly before compilation. The rule is applied independently.
+
+- For \src tensor, the rule is defined like: `[d] -> [1, d]`.
+
+- For \weights tensor, the rule is defined like: `[d] -> [d, 1]`.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+1|`weights` | Required
+2|`bias` | Optional
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` |Required
+
+## Supported data types
+
+MatMul operation supports the following data type combinations.
+
+Src | Weights | Bias | Dst
+--|--|-- | --
+f32 | f32 | f32 |f32
+bf16 | bf16 | bf16 |bf16
+f16 | f16 | f16 |f16
--- a/doc/graph/operations/MaxPool.md
+++ b/doc/graph/operations/MaxPool.md
@ -0,0 +1,54 @@
+# MaxPool {#dev_guide_op_maxpool}
+
+## General
+
+MaxPool operation performs the computation following the below formulas.
+Variable names follow the standard @ref dev_guide_conventions.
+
+\f[
+    \dst(n, c, oh, ow) =
+        \max\limits_{kh, kw}
+        \left(
+            \src(n, c, oh \cdot SH + kh \cdot (DH + 1) - PH_L, ow \cdot SW + kw \cdot (DW + 1) - PW_L)
+        \right)
+\f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[strides](@ref dnnl::graph::op::attr::strides) | Controls the strides the window is moved. |s64 |A s64 list containing positive values  | Required
+[pads_begin](@ref dnnl::graph::op::attr::pads_begin) | Controls number of zeros to be add to the front/top/left of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`.|s64 | A s64 list containing non-negative values  | Required
+[pads_end](@ref dnnl::graph::op::attr::pads_end) | Controls number of zeros to be add to the back/bottom/right of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`. |s64 |A s64 list containing non-negative values | Required
+[kernel](@ref dnnl::graph::op::attr::kernel) | Size of pooling window. | s64| A s64 list containing positive values | Required
+[rounding_type](@ref dnnl::graph::op::attr::rounding_type) | Controls how to do rounding. |string |  `floor` (default), `ceil` | Optional
+[auto_pad](@ref dnnl::graph::op::attr::auto_pad) |Controls how the paddings are calculated.| string | `none` (default), `same_upper`, `same_lower`, `valid` | Optional
+[dilations](@ref dnnl::graph::op::attr::dilations) |Denotes the distance in width and height between elements in the window. |s64 | A s64 list containing positive values, a list of `1`s (default) means no dilation| Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` | Required
+
+## Supported data types
+
+MaxPool operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 |f32
+bf16 |bf16
+f16 |f16
--- a/doc/graph/operations/MaxPoolBackward.md
+++ b/doc/graph/operations/MaxPoolBackward.md
@ -0,0 +1,46 @@
+# MaxPoolBackward {#dev_guide_op_maxpoolbackward}
+
+## General
+
+AvgPoolBackward operation accepts \src tensor and \f$\diffdst\f$ tensor, and
+calculates \f$\diffsrc\f$ tensor.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[strides](@ref dnnl::graph::op::attr::strides) | Controls the strides the window is moved. |s64 |A s64 list containing positive values  | Required
+[pads_begin](@ref dnnl::graph::op::attr::pads_begin) | Controls number of zeros to be add to the front/top/left of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`.|s64 | A s64 list containing non-negative values  | Required
+[pads_end](@ref dnnl::graph::op::attr::pads_end) | Controls number of zeros to be add to the back/bottom/right of spatial dimensions, the attribute will be ignored when `auto_pad` attribute is specified to `same_upper`, `same_lower` or `valid`. |s64 |A s64 list containing non-negative values | Required
+[kernel](@ref dnnl::graph::op::attr::kernel) | Size of pooling window | s64| A s64 list containing positive values | Required
+[auto_pad](@ref dnnl::graph::op::attr::auto_pad) |Controls how the paddings are calculated.| string | `none` (default), `same_upper`, `same_lower`, `valid` | Optional
+[dilations](@ref dnnl::graph::op::attr::dilations) |Denotes the distance in width and height between elements in the window.|s64 | A s64 list containing positive values, a list of `1`s (default) means no dilation| Optional
+[data_format](@ref dnnl::graph::op::attr::data_format) |Controls how to interpret the shape of `src` and `dst`.| string|`NCX`, `NXC` (default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+1|`diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`diff_src` | Required
+
+## Supported data types
+
+MaxPoolBackward operation supports the following data type combinations.
+
+Src | Diff_dst|Diff_src
+-- | --|--
+f32 |f32|f32
+bf16 |bf16|bf16
+f16 |f16|f16
--- a/doc/graph/operations/Maximum.md
+++ b/doc/graph/operations/Maximum.md
@ -0,0 +1,47 @@
+# Maximum {#dev_guide_op_maximum}
+
+## General
+
+Maximum operation performs element-wise maximum operation with two given tensors applying
+multi-directional broadcast rules.
+
+  \f[ \dst(\overline{x})) = max(\src\_0(\overline{x}), \src\_1(\overline{x})) \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[auto_broadcast](@ref dnnl::graph::op::attr::auto_broadcast) | Specifies rules used for auto-broadcasting of src tensors.|string |`none`,`numpy` (default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `src_0`       | Required
+1     | `src_1`       | Required
+
+@note Both src shapes should match and no auto-broadcasting is allowed if
+`auto_broadcast` attributes is `none`. `src_0` and `src_1` shapes can be
+different and auto-broadcasting is allowed if `auto_broadcast` attributes is
+`numpy`. Broadcasting is performed according to auto_broadcast value.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `dst`         | Required
+
+## Supported data types
+
+Maximum operation supports the following data type combinations.
+
+Source0/1  | Destination
+---- | -------
+f32  | f32
+bf16 | bf16
+f16  | f16
--- a/doc/graph/operations/Minimum.md
+++ b/doc/graph/operations/Minimum.md
@ -0,0 +1,47 @@
+# Minimum{#dev_guide_op_minimum}
+
+## General
+
+Minimum operation performs element-wise minimum operation with two given tensors applying
+multi-directional broadcast rules.
+
+  \f[ \dst(\overline{x})) = min(\src\_0(\overline{x}), \src\_1(\overline{x})) \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[auto_broadcast](@ref dnnl::graph::op::attr::auto_broadcast) | Specifies rules used for auto-broadcasting of src tensors. |string |`none`,`numpy` (default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `src_0`       | Required
+1     | `src_1`       | Required
+
+@note Both src shapes should match and no auto-broadcasting is allowed if
+`auto_broadcast` attributes is `none`. `src_0` and `src_1` shapes can be
+different and auto-broadcasting is allowed if `auto_broadcast` attributes is
+`numpy`. Broadcasting is performed according to auto_broadcast value.
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `dst`         | Required
+
+## Supported data types
+
+Minimum operation supports the following data type combinations.
+
+Source0/1  | Destination
+---- | -------
+f32  | f32
+bf16 | bf16
+f16  | f16
--- a/doc/graph/operations/Mish.md
+++ b/doc/graph/operations/Mish.md
@ -0,0 +1,38 @@
+# Mish {#dev_guide_op_mish}
+
+## General
+
+Mish performs element-wise activation function on a given input tensor, based 
+on the following mathematical formula:
+\f[ dst = src * \tanh(SoftPlus(src)) = src * \tanh(\ln(1 + e^{src})) \f]
+
+## Operation attributes
+
+Mish operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` |Required
+
+## Supported data types
+
+Mish operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/MishBackward.md
+++ b/doc/graph/operations/MishBackward.md
@ -0,0 +1,41 @@
+# MishBackward {#dev_guide_op_mishbackward}
+
+## General
+
+MishBackward operation computes gradient for Mish.
+
+\f[ dst & = diff_{dst} * \frac{e^{src} * \omega}{\delta^{2}}, where \\
+\omega & = e^{3src} + 4 * e^{2src} + e^{src} * (4 * src + 6) + 4 * (src + 1) \\
+\delta & = e^{2src} + 2 * e^{src} + 2 \f]
+
+## Operation attributes
+
+MishBackward operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+MishBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+f16 | f16 | f16
+bf16 | bf16 | bf16
--- a/doc/graph/operations/Multiply.md
+++ b/doc/graph/operations/Multiply.md
@ -0,0 +1,52 @@
+# Multiply{#dev_guide_op_multiply}
+
+## General
+
+Multiply operation performs element-wise multiply operation with two given tensors applying
+multi-directional broadcast rules.
+
+  \f[
+    \dst(\overline{x}) =
+        \src_0(\overline{x}) \times \src_1(\overline{x}),
+\f]
+
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[auto_broadcast](@ref dnnl::graph::op::attr::auto_broadcast) | Specifies rules used for auto-broadcasting of src tensors. |string |`none`,`numpy` (default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `src_0`       | Required             
+1     | `src_1`       | Required                 
+
+@note Both src shapes should match and no auto-broadcasting is allowed if
+`auto_broadcast` attributes is `none`. `src_0` and `src_1` shapes can be
+different and auto-broadcasting is allowed if `auto_broadcast` attributes is
+`numpy`. Broadcasting is performed according to auto_broadcast value.
+
+### Outputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `dst`         | Required                        
+
+## Supported data types
+
+Multiply operation supports the following data type combinations.
+
+Source0/1  | Destination
+---- | ------- 
+f32  | f32    
+bf16 | bf16    
+f16  | f16    
+
--- a/doc/graph/operations/PReLU.md
+++ b/doc/graph/operations/PReLU.md
@ -0,0 +1,60 @@
+# PReLU {#dev_guide_op_prelu}
+
+## General
+
+PReLU operation performs element-wise parametric ReLU operation on a given 
+input tensor, based on the following mathematical formula:
+
+\f[ dst = \begin{cases} src & \text{if}\ src \ge 0 \\
+    \alpha src & \text{if}\ src < 0 \end{cases} \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[data_format](@ref dnnl::graph::op::attr::data_format) | Denotes the data format of the input and output data. | string | `NCX`, `NXC`(default) | Optional
+[per_channel_broadcast](@ref dnnl::graph::op::attr::per_channel_broadcast) | Denotes whether to apply per_channel broadcast when slope is 1D tensor. | bool | `false`, `true`(default) | Optional
+
+### Broadcasting Rules
+
+Only slope tensor supports broadcasting semantics. Slope tensor is
+uni-directionally broadcasted to \src if one of the following rules is met:
+
+- 1: slope is 1D tensor and `per_channel_broadcast` is set to `true`, the
+  length of slope tensor is equal to the length of \src of channel dimension.
+
+- 2: slope is 1D tensor and `per_channel_broadcast` is set to `false`, the
+  length of slope tensor is equal to the length of \src of the rightmost
+  dimension.
+
+- 3: slope is nD tensor, starting from the rightmost dimension,
+  \f$input.shape_i == slope.shape_i\f$ or \f$slope.shape_i == 1\f$ or
+  slope dimension i is empty.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+1 | `slope` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` |Required
+
+## Supported data types
+
+PReLU operation supports the following data type combinations.
+
+Src | Dst | Slope
+-- | -- | --
+f32 | f32 | f32
+bf16 | bf16 | bf16
+f16 | f16 | f16
--- a/doc/graph/operations/PReLUBackward.md
+++ b/doc/graph/operations/PReLUBackward.md
@ -0,0 +1,56 @@
+# PReLUBackward {#dev_guide_op_prelubackward}
+
+## General
+
+PReLUBackward operation computes gradient for PReLU.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[data_format](@ref dnnl::graph::op::attr::data_format) | Denotes the data format of the input and output data. | string | `NCX`, `NXC`(default) | Optional
+
+### Broadcasting Rules
+
+Only slope tensor supports broadcasting semantics. Slope tensor is
+uni-directionally broadcasted to \src if one of the following rules is met:
+
+1. PyTorch case: slope is 1D tensor and broadcast per channel, length of
+  slope is equal to the length of \src in channel dimension.
+
+2. PyTorch case: slope is 1D tensor and broadcast per tensor, length of slope
+  is equal to 1.
+
+3. Tensorflow case: slope is nD tensor and its dimensions must be equal to
+  the \src dimensions starting from the second element:
+  \f$ slope\_shape = input\_forward\_shape[1:] \f$
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+1 | `slope` | Required
+2 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+1 | `diff_slope` | Required
+
+## Supported data types
+
+PReLUBackward operation supports the following data type combinations.
+
+Src | Slope | Diff_dst | Diff_src | Diff_slope
+-- | -- | -- | -- | --
+f32 | f32 | f32 | f32 | f32
+bf16 | bf16 | bf16 | bf16 | bf16
+f16 | f16 | f16 | f16 | f16
--- a/doc/graph/operations/Quantize.md
+++ b/doc/graph/operations/Quantize.md
@ -0,0 +1,55 @@
+# Quantize {#dev_guide_op_quantize}
+
+## General
+
+Quantize operation converts a f32 tensor to a quantized (u8/s8) tensor. It
+supports both per-tensor and per-channel asymmetric linear quantization. Output
+data type is specified in output tensor data type. Rounding mode is
+library-implementation defined.
+
+For per-tensor quantization:
+
+  \f[ \dst_{i} = round(\src_{i} / scale + zp) \f]
+
+For per-channel quantization, taking channel axis = 1 as an example:
+
+   \f[ dst_{\cdots,i,\cdots,\cdots} = round(\src_{\cdots,i,\cdots,\cdots} / scale_i + zp_i), i \in {[0, ic-1]} \f]
+
+where \f$ic\f$ is the number of channels.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[qtype](@ref dnnl::graph::op::attr::qtype) | Specifies which quantization type is used. |string | `per_tensor` (default), `per_channel` | Optional
+[axis](@ref dnnl::graph::op::attr::axis) | Specifies dimension on which per-channel quantization is applied. |s64 | A s64 value in the range of [-r, r-1] where r = rank(src), `1` by default | Optional
+[scales](@ref dnnl::graph::op::attr::scales) | Scalings applied on the src data. |f32 | A f32 list (only contain one element if qtype is `per_tensor`) | Required
+[zps](@ref dnnl::graph::op::attr::zps) | Offset values that maps to float zero. |s64 | A s64 list (only contain one element if qtype is `per_tensor`) | Required
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` |Required
+
+## Supported data types
+
+Quantize operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32  |s8, u8
+
+@note This operation is to support
+[int8 quantization](@ref dev_guide_graph_int8_quantization_model) model.
--- a/doc/graph/operations/ReLU.md
+++ b/doc/graph/operations/ReLU.md
@ -0,0 +1,40 @@
+# ReLU {#dev_guide_op_relu}
+
+## General
+
+ReLU applies following formula on every element of \src tensor (the
+variable names follow the standard @ref dev_guide_conventions):
+
+\f[ dst = \begin{cases} src & \text{if}\ src > 0 \\
+    0 & \text{if}\ src \leq 0 \end{cases} \f]
+
+## Operation attributes
+
+ReLU operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` |Required
+
+## Supported data types
+
+ReLU operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/ReLUBackward.md
+++ b/doc/graph/operations/ReLUBackward.md
@ -0,0 +1,39 @@
+# ReLUBackward {#dev_guide_op_relubackward}
+
+## General
+
+ReLUBackward operation computes gradient for ReLU.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[use_dst](@ref dnnl::graph::op::attr::use_dst) | If true, use `dst` of ReLU operation to calculate the gradient. Otherwise, use `src`. | bool | `true` (default), `false` | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` / `dst` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+ReLUBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+f16 | f16 | f16
+bf16 | bf16 | bf16
--- a/doc/graph/operations/Reciprocal.md
+++ b/doc/graph/operations/Reciprocal.md
@ -0,0 +1,39 @@
+# Reciprocal{#dev_guide_op_reciprocal}
+
+## General
+
+Reciprocal operation is element-wise Power operation where exponent(power) equals to -1. Reciprocal of 0 is infinity.
+
+\f[dst = \begin{cases} src^{-1} & \text{if}\ src \neq 0  \\
+    inf & \text{if}\ src = 0 \end{cases} \f]
+
+## Operation attributes
+
+Reciprocal operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `src`         | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | --------------------
+0     | `dst`         | Required
+
+## Supported data types
+
+Reciprocal operation supports the following data type combinations.
+
+Source  | Destination
+---- | -------
+f32  | f32
+bf16 | bf16
+f16  | f16
--- a/doc/graph/operations/ReduceL1.md
+++ b/doc/graph/operations/ReduceL1.md
@ -0,0 +1,56 @@
+# ReduceL1{#dev_guide_op_reducel1}
+
+## General
+
+ReduceL1 operation performs the reduction with finding the L1 norm (sum of
+absolute values) on a given src data along dimensions specified by axes.
+
+Take channel axis = 0 and keep_dims = True as an example:
+  \f[ {dst}_{0,\cdots,\cdots} =
+  \sum\limits_{i}|src_{i,\cdots,\cdots}| ,i \in [0,channelNum-1] \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[axes](@ref dnnl::graph::op::attr::axes) | Specify indices of src tensor, along which the reduction is performed. If axes is a list, reduce over all of them. If axes is empty, corresponds to the identity operation. If axes contains all dimensions of src tensor, a single reduction value is calculated for the entire src tensor. Exactly one of attribute `axes` and the second input tensor `axes` should be available. |s64 |A s64 list values which is in the range of [-r, r-1] where r = rank(src). Empty list(default)  | Optional
+[keep_dims](@ref dnnl::graph::op::attr::keep_dims) | If set to `true` it holds axes that are used for reduction. For each such axes, dst dimension is equal to 1. |bool |`true`,`false`(default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+----- | ------------- | -------------------- 
+0     | `src`         | Required            
+1     | `axes`        | Optional                 
+
+@note `axes` is an 1-D tensor specifying the axis along which the reduction is
+performed. 1D tensor of unique elements. The range of elements is [-r, r-1],
+where r is the rank of src tensor. Exactly one of attribute axes and the second
+input tensor axes should be available. 
+
+### Outputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `dst`         | Required                        
+
+@note The result of ReduceL1 function applied to src tensor. shape[i] =
+shapeOf(data)[i] for all i that is not in the list of axes from the second
+input. For dimensions from axes, shape[i] == 1 if keep_dims == True, or i-th
+dimension is removed from the dst otherwise.
+
+## Supported data types
+
+ReduceL1 operation supports the following data type combinations.
+
+Source/Destination  |Axes
+---- | ------- 
+f32  | s32    
+bf16 | s32    
+f16  | s32    
+
--- a/doc/graph/operations/ReduceL2.md
+++ b/doc/graph/operations/ReduceL2.md
@ -0,0 +1,56 @@
+# ReduceL2{#dev_guide_op_reducel2}
+
+## General
+
+ReduceL2 operation performs the reduction with finding the L2 norm (square root
+of sum of squares) on a given src data along dimensions specified by axes.
+
+Take channel axis = 0 and keep_dims = True as an example:
+  \f[ {dst}_{0,\cdots,\cdots} =
+  \sqrt{\sum\limits_{i}{src_{i,\cdots,\cdots}^2}} ,i \in [0,channelNum-1] \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[axes](@ref dnnl::graph::op::attr::axes) | Specify indices of src tensor, along which the reduction is performed. If axes is a list, reduce over all of them. If axes is empty, corresponds to the identity operation. If axes contains all dimensions of src tensor, a single reduction value is calculated for the entire src tensor. Exactly one of attribute `axes` and the second input tensor `axes` should be available. |s64 |A s64 list values which is in the range of [-r, r-1] where r = rank(src). Empty list(default)  | Optional
+[keep_dims](@ref dnnl::graph::op::attr::keep_dims) | If set to `true` it holds axes that are used for reduction. For each such axes, dst dimension is equal to 1. |bool |`true`,`false`(default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `src`         | Required            
+1     | `axes`        | Optional                 
+
+@note `axes` is an 1-D tensor specifying the axis along which the reduction is
+performed. 1D tensor of unique elements. The range of elements is [-r, r-1],
+where r is the rank of src tensor. Exactly one of attribute axes and the second
+input tensor axes should be available. 
+
+### Outputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `dst`         | Required                        
+
+@note The result of ReduceL2 function applied to src tensor. shape[i] =
+shapeOf(data)[i] for all i that is not in the list of axes from the second
+input. For dimensions from axes, shape[i] == 1 if keep_dims == True, or i-th
+dimension is removed from the dst otherwise.
+
+## Supported data types
+
+ReduceL2 operation supports the following data type combinations.
+
+Source/Destination  |Axes
+---- | ------- 
+f32  | s32    
+bf16 | s32    
+f16  | s32    
+
--- a/doc/graph/operations/ReduceMax.md
+++ b/doc/graph/operations/ReduceMax.md
@ -0,0 +1,56 @@
+# ReduceMax{#dev_guide_op_reducemax}
+
+## General
+
+ReduceMax operation performs the reduction with finding the maximum value on a
+given src data along dimensions specified by axes.
+
+Take channel axis = 0 and keep_dims = True as an example:
+  \f[ {dst}_{0,\cdots,\cdots} =
+  \max\{src_{i,\cdots,\cdots}\} ,i \in [0,channelNum-1] \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[axes](@ref dnnl::graph::op::attr::axes) | Specify indices of src tensor, along which the reduction is performed. If axes is a list, reduce over all of them. If axes is empty, corresponds to the identity operation. If axes contains all dimensions of src tensor, a single reduction value is calculated for the entire src tensor. Exactly one of attribute `axes` and the second input tensor `axes` should be available. |s64 |A s64 list values which is in the range of [-r, r-1] where r = rank(src). Empty list(default)  | Optional
+[keep_dims](@ref dnnl::graph::op::attr::keep_dims) | If set to `true` it holds axes that are used for reduction. For each such axes, dst dimension is equal to 1. |bool |`true`,`false`(default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `src`         | Required            
+1     | `axes`        | Optional                 
+
+@note `axes` is an 1-D tensor specifying the axis along which the reduction is
+performed. 1D tensor of unique elements. The range of elements is [-r, r-1],
+where r is the rank of src tensor. Exactly one of attribute axes and the second
+input tensor axes should be available. 
+
+### Outputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `dst`         | Required                        
+
+@note The result of ReduceMax function applied to src tensor. shape[i] =
+shapeOf(data)[i] for all i that is not in the list of axes from the second
+input. For dimensions from axes, shape[i] == 1 if keep_dims == True, or i-th
+dimension is removed from the dst otherwise.
+
+## Supported data types
+
+ReduceMax operation supports the following data type combinations.
+
+Source/Destination  |Axes
+---- | ------- 
+f32  | s32    
+bf16 | s32    
+f16  | s32    
+
--- a/doc/graph/operations/ReduceMean.md
+++ b/doc/graph/operations/ReduceMean.md
@ -0,0 +1,56 @@
+# ReduceMean{#dev_guide_op_reducemean}
+
+## General
+
+ReduceMean operation performs the reduction with finding the arithmetic mean on
+a given src data along dimensions specified by axes.
+
+Take channel axis = 0 and keep_dims = True as an example:
+  \f[ {dst}_{0,\cdots,\cdots} =\frac{
+  {\sum\limits_{i}{src_{i,\cdots,\cdots}}}}{channelNum} ,i \in [0,channelNum-1] \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[axes](@ref dnnl::graph::op::attr::axes) | Specify indices of src tensor, along which the reduction is performed. If axes is a list, reduce over all of them. If axes is empty, corresponds to the identity operation. If axes contains all dimensions of src tensor, a single reduction value is calculated for the entire src tensor. Exactly one of attribute `axes` and the second input tensor `axes` should be available. |s64 |A s64 list values which is in the range of [-r, r-1] where r = rank(src). Empty list(default)  | Optional
+[keep_dims](@ref dnnl::graph::op::attr::keep_dims) | If set to `true` it holds axes that are used for reduction. For each such axes, dst dimension is equal to 1. |bool |`true`,`false`(default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `src`         | Required            
+1     | `axes`        | Optional                 
+
+@note `axes` is an 1-D tensor specifying the axis along which the reduction is
+performed. 1D tensor of unique elements. The range of elements is [-r, r-1],
+where r is the rank of src tensor. Exactly one of attribute axes and the second
+input tensor axes should be available. 
+
+### Outputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `dst`         | Required                        
+
+@note The result of ReduceMean function applied to src tensor. shape[i] =
+shapeOf(data)[i] for all i that is not in the list of axes from the second
+input. For dimensions from axes, shape[i] == 1 if keep_dims == True, or i-th
+dimension is removed from the dst otherwise.
+
+## Supported data types
+
+ReduceMean operation supports the following data type combinations.
+
+Source/Destination  |Axes
+---- | ------- 
+f32  | s32    
+bf16 | s32    
+f16  | s32    
+
--- a/doc/graph/operations/ReduceMin.md
+++ b/doc/graph/operations/ReduceMin.md
@ -0,0 +1,56 @@
+# ReduceMin{#dev_guide_op_reducemin}
+
+## General
+
+ReduceMin operation performs the reduction with finding the minimum value on a
+given src data along dimensions specified by axes.
+
+Take channel axis = 0 and keep_dims = True as an example:
+  \f[ {dst}_{0,\cdots,\cdots} =
+  \min\{src_{i,\cdots,\cdots}\} ,i \in [0,channelNum-1] \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[axes](@ref dnnl::graph::op::attr::axes) | Specify indices of src tensor, along which the reduction is performed. If axes is a list, reduce over all of them. If axes is empty, corresponds to the identity operation. If axes contains all dimensions of src tensor, a single reduction value is calculated for the entire src tensor. Exactly one of attribute `axes` and the second input tensor `axes` should be available. |s64 |A s64 list values which is in the range of [-r, r-1] where r = rank(src). Empty list(default)  | Optional
+[keep_dims](@ref dnnl::graph::op::attr::keep_dims) | If set to `true` it holds axes that are used for reduction. For each such axes, dst dimension is equal to 1. |bool |`true`,`false`(default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `src`         | Required            
+1     | `axes`        | Optional                 
+
+@note `axes` is an 1-D tensor specifying the axis along which the reduction is
+performed. 1D tensor of unique elements. The range of elements is [-r, r-1],
+where r is the rank of src tensor. Exactly one of attribute axes and the second
+input tensor axes should be available. 
+
+### Outputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `dst`         | Required                        
+
+@note The result of ReduceMin function applied to src tensor. shape[i] =
+shapeOf(data)[i] for all i that is not in the list of axes from the second
+input. For dimensions from axes, shape[i] == 1 if keep_dims == True, or i-th
+dimension is removed from the dst otherwise.
+
+## Supported data types
+
+ReduceMin operation supports the following data type combinations.
+
+Source/Destination  |Axes
+---- | ------- 
+f32  | s32    
+bf16 | s32    
+f16  | s32    
+
--- a/doc/graph/operations/ReduceProd.md
+++ b/doc/graph/operations/ReduceProd.md
@ -0,0 +1,56 @@
+# ReduceProd{#dev_guide_op_reduceprod}
+
+## General
+
+ReduceProd operation performs the reduction with multiplication on a given src
+data along dimensions specified by axes.
+
+Take channel axis = 0 and keep_dims = True as an example:
+  \f[ {dst}_{0,\cdots,\cdots} =
+  \prod\limits_{i}src_{i,\cdots,\cdots} ,i \in [0,channelNum-1] \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[axes](@ref dnnl::graph::op::attr::axes) | Specify indices of src tensor, along which the reduction is performed. If axes is a list, reduce over all of them. If axes is empty, corresponds to the identity operation. If axes contains all dimensions of src tensor, a single reduction value is calculated for the entire src tensor. Exactly one of attribute `axes` and the second input tensor `axes` should be available. |s64 |A s64 list values which is in the range of [-r, r-1] where r = rank(src). Empty list(default)  | Optional
+[keep_dims](@ref dnnl::graph::op::attr::keep_dims) | If set to `true` it holds axes that are used for reduction. For each such axes, dst dimension is equal to 1. |bool |`true`,`false`(default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `src`         | Required            
+1     | `axes`        | Optional                 
+
+@note `axes` is an 1-D tensor specifying the axis along which the reduction is
+performed. 1D tensor of unique elements. The range of elements is [-r, r-1],
+where r is the rank of src tensor. Exactly one of attribute axes and the second
+input tensor axes should be available. 
+
+### Outputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `dst`         | Required                        
+
+@note The result of ReduceProd function applied to src tensor. shape[i] =
+shapeOf(data)[i] for all i that is not in the list of axes from the second
+input. For dimensions from axes, shape[i] == 1 if keep_dims == True, or i-th
+dimension is removed from the dst otherwise.
+
+## Supported data types
+
+ReduceProd operation supports the following data type combinations.
+
+Source/Destination  |Axes
+---- | ------- 
+f32  | s32    
+bf16 | s32    
+f16  | s32    
+
--- a/doc/graph/operations/ReduceSum.md
+++ b/doc/graph/operations/ReduceSum.md
@ -0,0 +1,56 @@
+# ReduceSum{#dev_guide_op_reducesum}
+
+## General
+
+ReduceSum operation performs the reduction with addition on a given src data
+along dimensions specified by axes.
+
+Take channel axis = 0 and keep_dims = True as an example:
+  \f[ {dst}_{0,\cdots,\cdots} =
+  \sum\limits_{i}src_{i,\cdots,\cdots} ,i \in [0,channelNum-1] \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[axes](@ref dnnl::graph::op::attr::axes) | Specify indices of src tensor, along which the reduction is performed. If axes is a list, reduce over all of them. If axes is empty, corresponds to the identity operation. If axes contains all dimensions of src tensor, a single reduction value is calculated for the entire src tensor. Exactly one of attribute `axes` and the second input tensor `axes` should be available. |s64 |A s64 list values which is in the range of [-r, r-1] where r = rank(src). Empty list(default)  | Optional
+[keep_dims](@ref dnnl::graph::op::attr::keep_dims) | If set to `true` it holds axes that are used for reduction. For each such axes, dst dimension is equal to 1. |bool |`true`,`false`(default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `src`         | Required            
+1     | `axes`        | Optional                 
+
+@note `axes` is an 1-D tensor specifying the axis along which the reduction is
+performed. 1D tensor of unique elements. The range of elements is [-r, r-1],
+where r is the rank of src tensor. Exactly one of attribute axes and the second
+input tensor axes should be available. 
+
+### Outputs
+
+Index | Argument Name | Required or Optional 
+----- | ------------- | -------------------- 
+0     | `dst`         | Required                        
+
+@note The result of ReduceSum function applied to src tensor. shape[i] =
+shapeOf(data)[i] for all i that is not in the list of axes from the second
+input. For dimensions from axes, shape[i] == 1 if keep_dims == True, or i-th
+dimension is removed from the dst otherwise.
+
+## Supported data types
+
+ReduceSum operation supports the following data type combinations.
+
+Source/Destination  |Axes
+---- | ------- 
+f32  | s32    
+bf16 | s32    
+f16  | s32    
+
--- a/doc/graph/operations/Reorder.md
+++ b/doc/graph/operations/Reorder.md
@ -0,0 +1,49 @@
+# Reorder {#dev_guide_op_reorder}
+
+## General
+
+Reorder operation converts \src tensor to \dst tensor with different layout. It
+supports the conversion between:
+
+- Two different opaque layouts.
+
+- Two different strided layouts.
+
+- One strided layout and another opaque layout.
+
+Reorder operation does not support layout conversion cross different backends or
+different engines. Unlike [reorder primitive](@ref dev_guide_reorder), Reorder
+operation cannot be used to cast the data type from \src to \dst. Please check
+the usage of [TypeCast](@ref dev_guide_op_typecast) and
+[Dequantize](@ref dev_guide_op_dequantize) operation.
+
+## Operation attributes
+
+Reorder operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` |Required
+
+## Supported data types
+
+Reorder operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32  |f32
+bf16  |bf16
+f16  |f16
--- a/doc/graph/operations/Round.md
+++ b/doc/graph/operations/Round.md
@ -0,0 +1,37 @@
+# Round {#dev_guide_op_round}
+
+## General
+
+Round operation rounds the values of a tensor to the nearest integer, 
+element-wise.
+
+## Operation attributes
+
+Round operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` | Required
+
+## Supported data types
+
+Round operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+f16 | f16
+bf16 | bf16
--- a/doc/graph/operations/Sigmoid.md
+++ b/doc/graph/operations/Sigmoid.md
@ -0,0 +1,39 @@
+# Sigmoid {#dev_guide_op_sigmoid}
+
+## General
+
+Sigmoid operation applies following formula on every element of \src tensor 
+(the variable names follow the standard @ref dev_guide_conventions):
+
+\f[ dst = \frac{1}{1 + e^{-src}} \f]
+
+## Operation attributes
+
+Sigmoid operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` |Required
+
+## Supported data types
+
+Sigmoid operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/SigmoidBackward.md
+++ b/doc/graph/operations/SigmoidBackward.md
@ -0,0 +1,39 @@
+# SigmoidBackward {#dev_guide_op_sigmoidbackward}
+
+## General
+
+SigmoidBackward operation computes gradient for Sigmoid.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[use_dst](@ref dnnl::graph::op::attr::use_dst) | If true, use `dst` of Sigmoid operation to calculate the gradient. Otherwise, use `src`. | bool | `true` (default), `false` | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` / `dst` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+SigmoidBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+f16 | f16 | f16
+bf16 | bf16 | bf16
--- a/doc/graph/operations/SoftPlus.md
+++ b/doc/graph/operations/SoftPlus.md
@ -0,0 +1,41 @@
+# SoftPlus {#dev_guide_op_softplus}
+
+## General
+
+SoftPlus operation applies following formula on every element of \src tensor 
+(the variable names follow the standard @ref dev_guide_conventions):
+
+\f[ dst = 1 / beta * \ln(e^{beta*src} + 1.0) \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[beta](@ref dnnl::graph::op::attr::beta) | Value for the SoftPlus formulation. | s64 | Arbitrary s64 value (`1` in default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` |Required
+
+## Supported data types
+
+SoftPlus operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/SoftPlusBackward.md
+++ b/doc/graph/operations/SoftPlusBackward.md
@ -0,0 +1,39 @@
+# SoftPlusBackward {#dev_guide_op_softplusbackward}
+
+## General
+
+SoftPlusBackward operation computes gradient for SoftPlus.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[beta](@ref dnnl::graph::op::attr::beta) | Value for the SoftPlus formulation. | s64 | Arbitrary s64 value (`1` in default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` |Required
+
+## Supported data types
+
+SoftPlusBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+bf16 | bf16 | bf16
+f16 | f16 | f16
--- a/doc/graph/operations/Softmax.md
+++ b/doc/graph/operations/Softmax.md
@ -0,0 +1,42 @@
+# Softmax {#dev_guide_op_softmax}
+
+## General
+
+Softmax operation applies following formula on every element of \src tensor 
+(the variable names follow the standard @ref dev_guide_conventions):
+
+\f[ dst_i = \frac{exp(src_i)}{\sum_{j=1}^{C} exp(src_j)} \f]
+where \f$ C \f$ is a size of tensor along axis dimension.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[axis](@ref dnnl::graph::op::attr::axis) | Represents the axis of which the Softmax is calculated. | s64 | Arbitrary s64 value (`1` in default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` |Required
+
+## Supported data types
+
+Softmax operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/SoftmaxBackward.md
+++ b/doc/graph/operations/SoftmaxBackward.md
@ -0,0 +1,39 @@
+# SoftmaxBackward {#dev_guide_op_softmaxbackward}
+
+## General
+
+SoftmaxBackward operation computes gradient for Softmax.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[axis](@ref dnnl::graph::op::attr::axis) | Represents the axis of which the Softmax is calculated. | s64 | Arbitrary s64 value (`1` in default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_dst` | Required
+1 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+SoftmaxBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+bf16 | bf16 | bf16
+f16 | f16 | f16
--- a/doc/graph/operations/Sqrt.md
+++ b/doc/graph/operations/Sqrt.md
@ -0,0 +1,36 @@
+# Sqrt {#dev_guide_op_sqrt}
+
+## General
+
+Sqrt operation performs element-wise square root operation with given tensor.
+
+## Operation attributes
+
+Sqrt operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` | Required
+
+## Supported data types
+
+Sqrt operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+f16 | f16
+bf16 | bf16
--- a/doc/graph/operations/SqrtBackward.md
+++ b/doc/graph/operations/SqrtBackward.md
@ -0,0 +1,39 @@
+# SqrtBackward {#dev_guide_op_sqrtbackward}
+
+## General
+
+SqrtBackward operation computes gradient for Sqrt.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[use_dst](@ref dnnl::graph::op::attr::use_dst) | If true, use `dst` of Sqrt operation to calculate the gradient. Otherwise, use `src`. | bool | `true` (default), `false` | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` / `dst` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+SqrtBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+f16 | f16 | f16
+bf16 | bf16 | bf16
--- a/doc/graph/operations/Square.md
+++ b/doc/graph/operations/Square.md
@ -0,0 +1,36 @@
+# Square {#dev_guide_op_square}
+
+## General
+
+Square operation performs element-wise square operation with given tensor.
+
+## Operation attributes
+
+Square operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` | Required
+
+## Supported data types
+
+Square operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+f16 | f16
+bf16 | bf16
--- a/doc/graph/operations/SquaredDifference.md
+++ b/doc/graph/operations/SquaredDifference.md
@ -0,0 +1,48 @@
+# SquaredDifference {#dev_guide_op_squareddifference}
+
+## General
+
+SquaredDifference operation performs element-wise subtraction operation with
+two given tensors applying multi-directional broadcast rules, after that each
+result of the subtraction is squared.
+
+Before performing arithmetic operation, \f$src_1\f$ and \f$src_2\f$ are
+broadcasted if their shapes are different and `auto_broadcast` attributes is not
+`none`. Broadcasting is performed according to `auto_broadcast` value. After
+broadcasting SquaredDifference does the following with the input tensors:
+
+\f[ dst_i = (src\_1_i - src\_2_i)^2 \f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[auto_broadcast](@ref dnnl::graph::op::attr::auto_broadcast) | Specifies rules used for auto-broadcasting of input tensors. | string | `none`, `numpy`(default) | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src_1` | Required
+1 | `src_2` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `dst` | Required
+
+## Supported data types
+
+SquaredDifference operation supports the following data type combinations.
+
+Src_1 | Src_2 | Dst
+-- | -- | --
+f32 | f32 | f32
+bf16 | bf16 | bf16
+f16 | f16 | f16
--- a/doc/graph/operations/StaticReshape.md
+++ b/doc/graph/operations/StaticReshape.md
@ -0,0 +1,53 @@
+# StaticReshape {#dev_guide_op_staticreshape}
+
+## General
+
+StaticReshape operation changes dimensions of \src tensor according to the
+specified shape. The volume of \src is equal to \dst, where volume is the
+product of dimensions. \dst may have a different memory layout from \src.
+StaticReshape operation is not guaranteed to return a view or a copy of \src
+when \dst is in-placed with the \src. StaticReshape can be used where if shape
+is stored in a constant node or available during graph building stage. Then
+shape can be passed via `shape` attribute.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[shape](@ref dnnl::graph::op::attr::shape) | Specifies rules used for auto-broadcasting of src tensors. |string |`none`, `numpy` (default)  | Required
+[special_zero](@ref dnnl::graph::op::attr::special_zero) | Controls how zero values in shape are interpreted. |bool |`true`, `false` | Required
+
+@note `shape`: dimension `-1` means that this dimension is calculated to keep
+the same overall elements count as the src tensor. That case that more than
+one `-1` in the shape is not supported.
+
+@note `special_zero`: if false, `0` in the shape is interpreted as-is (for
+example a zero-dimension tensor); if true, then all `0`s in shape implies the
+copying of corresponding dimensions from src into dst.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `src`       | Required             |
+
+### Outputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `dst`       | Required             |
+
+## Supported data types
+
+StaticReshape operation supports the following data type combinations.
+
+| Src  | Dst |
+| ---- | ------- |
+| f32  | f32     |
+| bf16 | bf16    |
+| f16  | f16     |
--- a/doc/graph/operations/StaticTranspose.md
+++ b/doc/graph/operations/StaticTranspose.md
@ -0,0 +1,46 @@
+# StaticTranspose {#dev_guide_op_statictranspose}
+
+## General
+
+StaticTranspose operation rearranges the dimensions of \src. \dst may have a
+different memory layout from \src. StaticTranspose operation is not guaranteed
+to return a view or a copy of \src when \dst is in-placed with the \src.
+
+\f[
+
+dst[src(order[0]), src(order[1]),\cdots, src(order[N-1])]\ =src[src(0), src(1),\cdots, src(N-1)]
+
+\f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[order](@ref dnnl::graph::op::attr::order) | Specifies permutation to be applied on `src`. |s64 |A s64 list containing the element in the range of [-N, N-1], negative value means counting from last axis | Required
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `src`       | Required             |
+
+### Outputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `dst`       | Required             |
+
+## Supported data types
+
+StaticTranspose operation supports the following data type combinations.
+
+| Src  | Dst |
+| ---- | ------- |
+| f32  | f32     |
+| bf16 | bf16    |
+| f16  | f16     |
--- a/doc/graph/operations/Subtract.md
+++ b/doc/graph/operations/Subtract.md
@ -0,0 +1,50 @@
+# Subtract {#dev_guide_op_subtract}
+
+## General
+
+Subtract operation performs element-wise subtraction operation with two given
+tensors applying multi-directional broadcast rules.
+
+\f[
+    \dst(\overline{x}) =
+        \src_0(\overline{x}) - \src_1(\overline{x}),
+\f]
+
+## Operation attributes
+
+Attribute Name | Description | Value Type |Supported Values | Required or Optional
+-- | -- | --| --|--
+[auto_broadcast](@ref dnnl::graph::op::attr::auto_broadcast) | Specifies rules used for auto-broadcasting of src tensors. |string |`none`,`numpy` (default)  | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `src_0`       | Required             |
+| 1     | `src_1`       | Required             |
+
+@note Both src shapes should match and no auto-broadcasting is allowed if
+`auto_broadcast` attributes is `none`. `src_0` and `src_1` shapes can be
+different and auto-broadcasting is allowed if `auto_broadcast` attributes is
+`numpy`. Broadcasting is performed according to auto_broadcast value.
+
+### Outputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `dst`         | Required             |
+
+## Supported data types
+
+Substract operation supports the following data type combinations.
+
+| Source0/1  | Destination |
+| ---- | ------- | 
+| f32  | f32     |
+| bf16 | bf16    | 
+| f16  | f16     |
--- a/doc/graph/operations/Tanh.md
+++ b/doc/graph/operations/Tanh.md
@ -0,0 +1,39 @@
+# Tanh {#dev_guide_op_tanh}
+
+## General
+
+Tanh operation applies following formula on every element of \src tensor (the
+variable names follow the standard @ref dev_guide_conventions):
+
+\f[ dst = tanh(src) \f]
+
+## Operation attributes
+
+Tanh operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` |Required
+
+## Supported data types
+
+Tanh operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+f32 | f32
+bf16 | bf16
+f16 | f16
--- a/doc/graph/operations/TanhBackward.md
+++ b/doc/graph/operations/TanhBackward.md
@ -0,0 +1,39 @@
+# TanhBackward {#dev_guide_op_tanhbackward}
+
+## General
+
+TanhBackward operation computes gradient for Tanh.
+
+## Operation attributes
+
+Attribute Name | Description | Value Type | Supported Values | Required or Optional
+-- | -- | -- | -- | --
+[use_dst](@ref dnnl::graph::op::attr::use_dst) | If true, use `dst` of Tanh operation to calculate the gradient. Otherwise, use `src`. | bool | `true` (default), `false` | Optional
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `src` / `dst` | Required
+1 | `diff_dst` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0 | `diff_src` | Required
+
+## Supported data types
+
+TanhBackward operation supports the following data type combinations.
+
+Src | Diff_dst | Diff_src
+-- | -- | --
+f32 | f32 | f32
+f16 | f16 | f16
+bf16 | bf16 | bf16
--- a/doc/graph/operations/TypeCast.md
+++ b/doc/graph/operations/TypeCast.md
@ -0,0 +1,40 @@
+# TypeCast {#dev_guide_op_typecast}
+
+## General
+
+TypeCast operation performs element-wise cast from input data type to the data
+type given by output tensor. It requires that \src and \dst have the same shape
+and layout. Rounding to nearest even will be used during cast.
+
+## Operation attributes
+
+TypeCast operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`src` | Required
+
+### Outputs
+
+Index | Argument Name | Required or Optional
+-- | -- | --
+0|`dst` |Required
+
+## Supported data types
+
+TypeCast operation supports the following data type combinations.
+
+Src | Dst
+-- | --
+bf16, f16  |f32
+f32 | bf16, f16
+
+@note This operation is to support
+[mixed precision](@ref dev_guide_graph_mixed_precision_model) computation.
--- a/doc/graph/operations/Wildcard.md
+++ b/doc/graph/operations/Wildcard.md
@ -0,0 +1,35 @@
+# Wildcard {#dev_guide_op_wildcard}
+
+## General
+
+Wildcard operation is used to represent any compute logic and help construct
+graph. Typically the operation can support mapping any framework ops which are
+not supported by the library implementation. It's useful to make the graph
+completed or connected.
+
+## Operation attributes
+
+Wildcard operation does not support any attribute.
+
+## Execution arguments
+
+The inputs and outputs must be provided according to below index order when
+constructing an operation.
+
+### Inputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `src`       | Optional             |
+
+### Outputs
+
+| Index | Argument Name | Required or Optional |
+| ----- | ------------- | -------------------- |
+| 0     | `dst`       | Optional             |
+
+@note WildCard operation can accept arbitrary number of inputs or outputs.
+
+## Supported data types
+
+Wildcard operation supports arbitrary data type combinations.
--- a/doc/graph/programming_model/graph_basic_concepts.md
+++ b/doc/graph/programming_model/graph_basic_concepts.md
@ -1,12 +1,12 @@
-# oneDNN Graph API Concepts {#dev_guide_graph_basic_concepts}
+# Basic Concepts {#dev_guide_graph_basic_concepts}

 ## Introduction

-oneDNN Graph API programming model allows users to express their computational
-graph and generate optimized sub-graphs which are called `partitions` in the
-library. `Partition` is decided by oneDNN Graph API implementation. It is the
-key concept to satisfy the different needs of AI hardware classes by using a
-unified API. Users can compile `partitions`, bind `tensor` data, and execute
+In oneDNN Graph API programming model, a computation graph is passed to library
+and then optimized sub-graphs which are called `partitions` are returned by
+the library. `Partition` is decided by oneDNN Graph API implementation. It is
+the key concept to satisfy the different needs of AI hardware classes by using a
+unified API. Typically can compile `partitions`, bind `tensor` data, and execute
 `compiled partitions`.

 The key concepts in oneDNN Graph API include `logical tensor`, `op`, `graph`,
@ -14,15 +14,15 @@ The key concepts in oneDNN Graph API include `logical tensor`, `op`, `graph`,
 between these entities. Besides, oneDNN Graph API shares the common `engine` and
 `stream` concepts of oneDNN primitive API.

-@img{img_graph_programming_model.jpg,Figure 1: Overview of Graph API programming model. Blue rectangles denote oneDNN objects\, and red lines denote dependencies between objects.,80%,}
+@img{img_graph_programming_model.png,Figure 1: Overview of Graph API programming model. Blue rectangles denote oneDNN objects\, and red lines denote dependencies between objects.,80%,}

 ## Logical Tensor

 `Logical tensor` (@ref dnnl::graph::logical_tensor) describes the metadata of
 the input and output tensors, like data type, number of dimensions, size for
 each dimension, tensor layout and property. Each logical tensor has a unique ID
-which is immutable during the lifetime of a logical tensor. Users cannot modify
-the metadata of a logical tensor without creating a new one.
+which is immutable during the lifetime of a logical tensor. The metadata of a
+logical tensor cannot be modified without creating a new one.

 ## Op

@ -36,13 +36,13 @@ tensor as the edge between them.
 ## Graph

 `Graph` (@ref dnnl::graph::graph) contains a set of operations. A graph object
-is associated to a specific engine kind (@ref dnnl::engine::kind). Users can add
-multiple operations (@ref dnnl::graph::graph::add_op) and their input and output
-logical tensors to a graph. After finishing adding operations, users need to
-call a finalization API (@ref dnnl::graph::graph::finalize) to indicate that the
-graph is ready for partitioning. By calling partitioning API (@ref
-dnnl::graph::get_partitions), users will get a group of partitions from the
-graph.
+is associated to a specific engine kind (@ref dnnl::engine::kind). Multiple
+operations can be added (@ref dnnl::graph::graph::add_op) along with input and
+output logical tensors to a graph. After finishing adding operations,
+finalization API (@ref dnnl::graph::graph::finalize) can be called to indicate
+that the graph is ready for partitioning. By calling partitioning API (@ref
+dnnl::graph::graph::get_partitions), a group of partitions from the graph will
+be returned .

 ## Partition

@ -68,34 +68,35 @@ logical tensors.
 A partition may contains many logical tensors with part of them are internal
 intermediate results connecting two operations inside the partition. The
 required inputs and outputs of a partition are also called `ports` of a
-partition. Users can call API `get_input_ports` (@ref
+partition. Two APIs `get_input_ports` (@ref
 dnnl::graph::partition::get_input_ports) and `get_output_ports` (@ref
-dnnl::graph::partition::get_output_ports) to query the ports and understand
-which input logical tensors and output logical tensors are needed to compile a
-partition. The input logical tensors and output logical tensors must match IDs
-with ports. These in ports and out ports can also be used to track the producer
-and consumer of a partitions through logical tensor IDs and for framework
-integration, connect the partition back to the framework graph as a custom node.
+dnnl::graph::partition::get_output_ports) are provided to query the ports and
+help understand which input logical tensors and output logical tensors are
+needed to compile a partition. The input logical tensors and output logical
+tensors must match IDs with ports. These in ports and out ports can also be used
+to track the producer and consumer of a partitions through logical tensor IDs
+and for framework integration, connect the partition back to the framework graph
+as a custom node.

 ## Compiled Partition

 `Compiled partition` (@ref dnnl::graph::compiled_partition) represents the
 generated code specialized for a target hardware and tensor metadata passed
 through compilation API. To execute a compiled partition (@ref
-dnnl::graph::compiled_partition::execute), users need to pass input and output
-tensors and a stream (@ref dnnl::stream). Input and output tensors must bind
-data buffers to the input and output logical tensors respectively.
+dnnl::graph::compiled_partition::execute), both input and output tensors, and a
+stream (@ref dnnl::stream) are required to pass. Input and output tensors must
+bind data buffers to the input and output logical tensors respectively.

-Users can query output logical tensors (@ref
-dnnl::graph::compiled_partition::query_logical_tensor) from a compiled partition
-to know the output layout and memory size (@ref
-dnnl::graph::logical_tensor::get_size) when they specify output logical tensor
-with `any` layout type during compilation.
+An API (@ref dnnl::graph::compiled_partition::query_logical_tensor) is provided
+to query output logical tensors from a compiled partition. It allows to know the
+output layout and memory size (@ref dnnl::graph::logical_tensor::get_mem_size)
+when they specify output logical tensor with `any` layout type during
+compilation.

 ## Tensor

 `Tensor` (@ref dnnl::graph::tensor) is an abstraction for multi-dimensional
 input and output data which is needed in the execution of a compiled partition.
 A tensor contains a logical tensor, an engine (@ref dnnl::engine), and a data
-handle. Users are responsible for managing the data handle's lifecycle, e.g.
-free the memory resource when it is not used anymore.
+handle. The application is responsible for managing the data handle's lifecycle,
+for example free the memory resource when it is not used anymore.
--- a/doc/graph/programming_model/images/bf16_programming.jpg
+++ b/doc/graph/programming_model/images/bf16_programming.jpg
--- a/doc/graph/programming_model/images/img_graph_programming_model.png
+++ b/doc/graph/programming_model/images/img_graph_programming_model.png
--- a/doc/graph/programming_model/images/int8_programming.jpg
+++ b/doc/graph/programming_model/images/int8_programming.jpg
--- a/doc/graph/programming_model/low_precision.md
+++ b/doc/graph/programming_model/low_precision.md
@ -0,0 +1,57 @@
+# Low Precision {#dev_guide_graph_low_precision}
+
+oneDNN Graph provides low precision support with int8 (signed/unsigned 8-bit
+integer), bf16 and f16 data types. oneDNN Graph API expects the computation
+graph is converted to low precision representation, the data's precision and
+quantization parameters are specified explicitly. oneDNN Graph API
+implementation will strictly respect the numeric precision of the computation.
+
+@anchor dev_guide_graph_int8_quantization_model
+## INT8
+
+oneDNN Graph API provides below two operations to support quantized model with
+static quantization:
+
+- [Dequantize](@ref dev_guide_op_dequantize)
+
+- [Quantize](@ref dev_guide_op_quantize)
+
+Dequantize operation takes integer tensor with its associated scale and zero
+point and returns f32 tensor. Quantize operation takes f32 tensor, scale, zero
+point, and returns integer tensor. The scale and zero point are single dimension
+tensors, which could contain one value for the per-tensor quantization case or
+multiple values for the per-channel quantization case. The integer tensor could
+be represented in unsigned int8 or signed int8 data type. Zero point could be
+zero for symmetric quantization scheme, and a non-zero value for asymmetric
+quantization scheme.
+
+Dequantize and Quantize operation should be inserted manually in the graph as
+part of quantization process before passing to oneDNN Graph. oneDNN Graph honors
+the data type passed via logical tensor and faithfully follows the numeric
+semantics. For example, if the graph has a Quantize operation followed by a
+Dequantize operation with exact same scale and zero point, oneDNN Graph
+implementation should not eliminate them since that implicitly changes the
+numeric precision.
+
+oneDNN Graph partitioning API may return a partition containing Dequantize,
+Quantize, and Convolution operations in-between. It is not necessary to
+recognize the subgraph pattern explicitly and convert to fused operation.
+Depending on oneDNN Graph implementation capability, the partition may include
+more or fewer operations.
+
+@img{int8_programming.jpg,Figure 1: Overview of int8 programming model.,80%,}
+
+@anchor dev_guide_graph_mixed_precision_model
+## BF16/F16
+
+oneDNN Graph provides [TypeCast](@ref dev_guide_op_typecast) operation, which
+can convert a f32 tensor to bf16 or f16, and vice versa. It is used to support
+auto mixed precision mechanism in popular deep learning frameworks. All oneDNN
+Graph operations support bf16 and f16 data types.
+
+A TypeCast operation performing down conversion should be inserted clearly to
+indicate the use of low numeric precision. oneDNN Graph implementation fully
+honors the API-specified numeric precision and only performs the computation
+using the API-specified or higher numeric precision.
+
+@img{bf16_programming.jpg,Figure 2: Overview of bf16 programming model.,80%,}
--- a/doc/graph/rst/graph_programming_model.rst
+++ b/doc/graph/rst/graph_programming_model.rst
@ -0,0 +1,2 @@
+Programming Model
+#################
--- a/doc/graph/rst/graph_supported_operations.rst
+++ b/doc/graph/rst/graph_supported_operations.rst
@ -0,0 +1,6 @@
+Supported Operations
+####################
+
+The complete operation set is defined at
+`specification <https://spec.oneapi.io/onednn-graph/latest/ops/index.html>`__.
+Here only a subset of operation set is implemented.
--- a/doc/graph/supported_patterns.md
+++ b/doc/graph/supported_patterns.md
@ -0,0 +1,104 @@
+# Supported Fusion Patterns {#dev_guide_graph_fusion_patterns}
+
+## Fusion Patterns
+
+The following fusion patterns are subgraphs that the oneDNN Graph API recognizes
+as candidate for fusion. The patterns are described using oneDNN Graph
+operation (op) names with the following convention.
+
+@note oneDNN Graph performs limited input validation to minimize the performance
+overheads. The application is responsible for sanitizing inputs passed to the
+library. For large u8 or s8 inputs may lead to accumulator overflow, you can use
+floating point patterns instead of quantized patterns.
+
+`"+"` describes a chain of two ops. The preceding op produces an output tensor,
+which is consumed by the following op as its first operand.
+
+`"[]"` describes a component of the overall pattern description. For example,
+it could include a subgraph or all the op choices within the bracket.
+
+`"|"` describes choices of multiple operations, say A+[B|C] means the graph
+partition contains A followed by B or C.
+
+`","` describes a graph composed of multiple subgraphs, each subgraph marks its
+output tensor explicitly, which is consumed by other subgraphs.
+
+`Superscript` denotes the numbers of repetition pattern. For example,
+A+[B|C]\f$^{3}\f$ means the graph partition contains A followed by three ops,
+each of them is either B or C. The superscript could be a range of number
+meaning allowing a range of repetition. If the range is between 0 and 1, we use
+superscript `"?"`.
+
+`Subscript` denotes the input and output tensors which need to explicitly mark
+the producer and consumer relation within one graph partition. For example,
+A\f$_{>t1}\f$+B+C\f$_{<t1}\f$ refers
+to the pattern started with A followed by B and C, and C takes an implicit input
+tensor from B and an extra tensor t1 output from A. `">"` refers to the output
+tensor, and `"<"` for input tensor.  Input and output tensor between neighbor
+ops are not explicitly marked, for example, B consumes t1 implicitly in the
+example above.
+
+Subscript `"out"` marks the output tensor of a certain op to be the output of
+a graph partition. For example, in
+A\f$_{>t1}\f$+B\f$_{>out}\f$+C\f$_{<t1,>out}\f$, B's output and C's output
+are marked as output tensors.
+
+Subscript `"in"` marks the input tensor of a certain op to be the input of a
+graph partition. For example, in A\f$_{<in1}\f$+B\f$_{<in1}\f$ A's input and
+B's second input are graph partition input, and they share the same input tensor
+in1. Most input tensors of a graph partition are not explicitly marked.
+For example, the input tensors of the first op are implicitly regarded as graph
+partition inputs. Besides, for input tensors of other ops, if they are not
+produced by any proceeding ops, they are regarded as implicit graph partition
+inputs. In the example A\f$_{>t1}\f$+B+C\f$_{<t1}\f$, A's inputs are
+regarded as implicit graph partition inputs, and if B is a binary operation, the
+second input tensor is an implicit graph partition input.
+
+The following categories will be used in describing fusion pattern.
+
+Unary = [Abs | Clamp | Elu | Exp | GELU | HardSwish | LeakyReLU |
+Log | Sigmoid | SoftPlus | Pow | ReLU | Round | Sqrt | Square | Tanh]
+
+Binary = [Add | Divide | Maximum | Minimum | Multiply | Subtract]
+
+Reduction = [ReduceL1 | ReduceL2 | ReduceMax | ReduceMean | ReduceMin |
+ReduceProd | ReduceSum]
+
+### Inference
+
+#### Floating Point Patterns
+
+Pattern | Description
+:-- | :--:
+Convolution + BiasAdd\f$^?\f$ + BatchNormInference\f$^?\f$ + [Unary \| Binary]\f$^{0-3}\f$\f$_{>out}\f$ | This pattern is widely used in Convolution Neural Networks, for example ResNet, ResNext, SSD, etc.
+ConvTranspose + BiasAdd\f$^?\f$ + [Unary \| Binary]\f$^{0-3}\f$\f$_{>out}\f$ | This pattern is widely used in Generative Adversarial Networks.
+Interpolate + [Unary \| Binary]\f$^{0-3}\f$\f$_{>out}\f$ | This pattern is widely used for image processing.
+MatMul + BiasAdd\f$^?\f$ + [Unary \| Binary]\f$^{0-3}\f$\f$_{>out}\f$ | This pattern is widely used in language models and recommendation models, for example BERT, DLRM, etc.
+Reduction + [Unary \| Binary]\f$^{0-3}\f$\f$_{>out}\f$ | This pattern is widely used for data processing, for example loss reduction.
+Unary + Binary\f$^{0-3}\f$\f$_{>out}\f$ | This pattern is widely used in Convolution Neural Networks.
+Binary + [Unary \| Binary]\f$^{0-3}\f$\f$_{>out}\f$ | This pattern is widely used in Generative Adversarial Networks, for example ParallelWaveGAN.
+[AvgPool \| MaxPool] + Binary\f$^{0-3}\f$\f$_{>out}\f$ | This pattern is widely used in Convolution Neural Networks.
+BatchNormInference + ReLU\f$_{>out}\f$ | This pattern is widely used in Convolution Neural Networks, for example DenseNet.
+Reciprocal + Multiply\f$_{>out}\f$ | N/A
+Reorder + Add\f$_{>out}\f$ | N/A
+
+#### Quantized Patterns
+
+Pattern | Description
+:-- | :--:
+Quantize\f$^?\f$ + Dequantize\f$_{>t1}\f$, Dequantize\f$_{>t2}\f$\f$^{0-3}\f$, Dequantize + Convolution\f$_{<t1}\f$ + BiasAdd\f$^?\f$ + [Unary \| Binary\f$_{<t2}\f$]\f$^{0-3}\f$ + Quantize\f$^?\f$\f$_{>out}\f$ | N/A
+Quantize\f$^?\f$ + Dequantize\f$_{>t1}\f$, Dequantize\f$_{>t2}\f$\f$^{0-3}\f$, Dequantize + ConvTranspose\f$_{<t1}\f$ + BiasAdd\f$^?\f$ + [Unary \| Binary\f$_{<t2}\f$]\f$^{0-3}\f$ + Quantize\f$^?\f$\f$_{>out}\f$ |N/A
+Quantize\f$^?\f$ + Dequantize\f$_{>t1}\f$, Dequantize\f$_{>t2}\f$\f$^{0-3}\f$, Dequantize + MatMul\f$_{<t1}\f$ + BiasAdd\f$^?\f$ + [Unary \| Binary\f$_{<t2}\f$]\f$^{0-3}\f$ + Quantize\f$^?\f$\f$_{>out}\f$ |N/A
+Dequantize + [AvgPool \| MaxPool] + Quantize\f$_{>out}\f$ |N/A
+Dequantize\f$_{>t1}\f$, Dequantize + [AvgPool \| MaxPool] + Add\f$_{<t1}\f$ + Quantize\f$_{>out}\f$ |N/A
+Dequantize + Reorder + Quantize\f$_{>out}\f$ |N/A
+Dequantize\f$_{>t1}\f$, Dequantize + Reorder + Add\f$_{<t1}\f$ + Quantize\f$_{>out}\f$ |N/A
+
+#### Training
+
+Pattern | Description
+:-- | :--:
+ConvolutionBackwardWeights + BiasAddBackward\f$_{>out}\f$ | N/A
+ReLUBackward + BatchNormTrainingBackward\f$_{>out}\f$ |N/A
+
+All the above fusion patterns are supported by default.
--- a/doc/programming_model/basic_concepts.md
+++ b/doc/programming_model/basic_concepts.md
@ -120,3 +120,18 @@ The sequence of actions to create a primitive is:
   memory formats if the primitive supports it.

 2. Create a primitive based on the primitive descriptor obtained in step 1.
+
+## Graph Extension
+
+Graph extension is a high level abstraction in oneDNN that allows you to work
+with a computation graph instead of individual primitives. This approach allows
+you to make an operation fusion:
+
+* Transparent: the integration efforts are reduced by abstracting backend-aware
+  fusion logic.
+
+* Scalable: no integration code change is necessary to benefit from new fusion
+  patterns enabled in oneDNN.
+
+The programming model for the graph extension is detailed in the
+[graph basic concepts section](@ref dev_guide_graph_basic_concepts).
--- a/doc/programming_model/images/img_graph_programming_model.jpg
+++ b/doc/programming_model/images/img_graph_programming_model.jpg
--- a/doc/rst/graph_extension.rst
+++ b/doc/rst/graph_extension.rst
@ -0,0 +1,10 @@
+Graph Extension
+###############
+
+.. toctree::
+   :maxdepth: 1
+
+   graph_programming_model
+   graph_supported_operations
+   dev_guide_graph_fusion_patterns
+   dev_guide_graph_dump
--- a/doc/rst/index.rst
+++ b/doc/rst/index.rst
@ -1,5 +1,5 @@
-oneDNN Documentation
-========================
+Intel® oneAPI Deep Neural Network Library Developer Guide and Reference
+=======================================================================

 .. toctree::
   :maxdepth: 1
@ -7,6 +7,7 @@ oneDNN Documentation
   build_and_link
   programming_model
   supported_primitives
+   graph_extension
   dev_guide_examples
   performance_profiling_and_inspection
   advanced_topics
--- a/doc/sphinx/conf.py
+++ b/doc/sphinx/conf.py
@ -164,7 +164,9 @@ mathjax3_config = {
        'diffdstiterc': '\\operatorname{diff\\_dst\\_iter\\_c}',
        'diffgamma': '\\operatorname{diff\\_\\gamma}',
        'diffbeta': '\\operatorname{diff\\_\\beta}',
-        'workspace': '\\operatorname{workspace}'
+        'workspace': '\\operatorname{workspace}',
+        'srcshape': '\\operatorname{src\\_\\shape}',
+        'dstshape': '\\operatorname{dst\\_\\shape}'
        }
    }
 }
@ -200,7 +202,88 @@ def addTocTrees(app, env, docnames):

    trees2Add = {'rst/dev_guide_inference_and_training_aspects.rst':['dev_guide_inference.rst','dev_guide_inference_int8.rst','dev_guide_training_bf16.rst'],
                 'rst/dev_guide_attributes.rst':['dev_guide_attributes_fpmath_mode.rst','dev_guide_attributes_quantization.rst','dev_guide_attributes_post_ops.rst','dev_guide_attributes_scratchpad.rst'],
-                 'rst/dev_guide_basic_concepts.rst':['dev_guide_graph_basic_concepts.rst']}
+                 'rst/graph_supported_operations.rst':[
+                    'dev_guide_op_abs.rst',
+                    'dev_guide_op_absbackward.rst',
+                    'dev_guide_op_add.rst',
+                    'dev_guide_op_avgpool.rst',
+                    'dev_guide_op_avgpoolbackward.rst',
+                    'dev_guide_op_batchnormforwardtraining.rst',
+                    'dev_guide_op_batchnorminference.rst',
+                    'dev_guide_op_batchnormtrainingbackward.rst',
+                    'dev_guide_op_biasadd.rst',
+                    'dev_guide_op_biasaddbackward.rst',
+                    'dev_guide_op_clamp',
+                    'dev_guide_op_clampbackward',
+                    'dev_guide_op_concat.rst',
+                    'dev_guide_op_convolution.rst',
+                    'dev_guide_op_convolutionbackwarddata.rst',
+                    'dev_guide_op_convolutionbackwardweights.rst',
+                    'dev_guide_op_convtranspose.rst',
+                    'dev_guide_op_convtransposebackwarddata.rst',
+                    'dev_guide_op_convtransposebackwardweights.rst',
+                    'dev_guide_op_dequantize.rst',
+                    'dev_guide_op_divide.rst',
+                    'dev_guide_op_dynamicdequantize.rst',
+                    'dev_guide_op_dynamicquantize.rst',
+                    'dev_guide_op_elu',
+                    'dev_guide_op_elubackward',
+                    'dev_guide_op_end',
+                    'dev_guide_op_exp',
+                    'dev_guide_op_gelu',
+                    'dev_guide_op_gelubackward',
+                    'dev_guide_op_hardswish',
+                    'dev_guide_op_hardswishbackward',
+                    'dev_guide_op_interpolate.rst',
+                    'dev_guide_op_interpolatebackward.rst',
+                    'dev_guide_op_layernorm.rst',
+                    'dev_guide_op_layernormbackward.rst',
+                    'dev_guide_op_leakyrelu',
+                    'dev_guide_op_log',
+                    'dev_guide_op_logsoftmax',
+                    'dev_guide_op_logsoftmaxbackward',
+                    'dev_guide_op_matmul.rst',
+                    'dev_guide_op_maximum.rst',
+                    'dev_guide_op_maxpool.rst',
+                    'dev_guide_op_maxpoolbackward.rst',
+                    'dev_guide_op_minimum.rst',
+                    'dev_guide_op_mish',
+                    'dev_guide_op_mishbackward',
+                    'dev_guide_op_multiply.rst',
+                    'dev_guide_op_prelu',
+                    'dev_guide_op_prelubackward',
+                    'dev_guide_op_quantize.rst',
+                    'dev_guide_op_reciprocal.rst',
+                    'dev_guide_op_reducel1.rst',
+                    'dev_guide_op_reducel2.rst',
+                    'dev_guide_op_reducemax.rst',
+                    'dev_guide_op_reducemean.rst',
+                    'dev_guide_op_reducemin.rst',
+                    'dev_guide_op_reduceprod.rst',
+                    'dev_guide_op_reducesum.rst',
+                    'dev_guide_op_relu.rst',
+                    'dev_guide_op_relubackward',
+                    'dev_guide_op_reorder.rst',
+                    'dev_guide_op_round',
+                    'dev_guide_op_sigmoid',
+                    'dev_guide_op_sigmoidbackward',
+                    'dev_guide_op_softmax',
+                    'dev_guide_op_softmaxbackward',
+                    'dev_guide_op_softplus',
+                    'dev_guide_op_softplusbackward',
+                    'dev_guide_op_sqrt',
+                    'dev_guide_op_sqrtbackward',
+                    'dev_guide_op_square',
+                    'dev_guide_op_squareddifference',
+                    'dev_guide_op_staticreshape',
+                    'dev_guide_op_statictranspose',
+                    'dev_guide_op_subtract.rst',
+                    'dev_guide_op_tanh',
+                    'dev_guide_op_tanhbackward',
+                    'dev_guide_op_typecast.rst',
+                    'dev_guide_op_wildcard'
+                    ],
+                 'rst/graph_programming_model.rst':['dev_guide_graph_basic_concepts.rst', 'dev_guide_graph_low_precision.rst']}


    for rstFile in trees2Add: