frozenleaves/oneDNN

Fork 0

mirror of https://github.com/uxlfoundation/oneDNN.git synced 2025-10-20 10:03:50 +08:00

Files

Viktoriia 56e219e035 all: updated Github links to uxlfoundation

2025-03-12 15:08:59 -07:00

13 KiB

Raw Blame History

Attributes

Usage

    --attr-scratchpad=MODE
    --attr-fpmath=MATHMODE[:APPLY_TO_INT]
    --attr-acc-mode=ACCMODE
    --attr-rounding-mode=ARG:MODE[+...]
    --attr-deterministic=BOOL
    --attr-dropout=PROBABILITY[:SEED[:TAG]]
    --attr-scales=ARG:POLICY[:SCALE[:DATA_TYPE[:GROUPS]]][+...]
    --attr-zero-points=ARG:POLICY[:ZEROPOINT[:DATA_TYPE[:GROUPS]]][+...]
    --attr-post-ops=SUM[:SCALE[:ZERO_POINT[:DATA_TYPE]]]
                    ELTWISE[:ALPHA[:BETA[:SCALE]]]
                    DW:KkSsPp[:DST_DT]
                    BINARY:DT[:MASK_INPUT[:TAG]]
                    PRELU[:POLICY]

--attr-scratchpad

--attr-scratchpad specifies the scratchpad mode to be used for benchmarking. MODE values can be library (the default) or user. Refer to scratchpad primitive attribute for details.

--attr-fpmath

--attr-fpmath specifies the fpmath mode to be used for benchmarking. MATHMODE values can be any of strict (the default), bf16, f16, tf32, or any. APPLY_TO_INT values can be either true (the default) or false. Refer to fpmath primitive attribute for details.

--attr-acc-mode

--attr-acc-mode specifies the accumulation mode to be used for benchmarking. ACCMODE values can be any of strict (the default), relaxed, any, f32, s32 or f16. Refer to accumulation mode primitive attribute for details.

--attr-rounding-mode

--attr-rounding-mode specifies the rounding mode to be used for benchmarking. ARG specifies which memory argument will be modified. Supported values are:

dst corresponds to DNNL_ARG_DST.
diff_src corresponds to DNNL_ARG_DIFF_SRC.
diff_weights corresponds to DNNL_ARG_DIFF_WEIGHTS. MODE specifies which mode to apply to the corresponding memory argument. Supported values are: environment (default) and stochastic. Refer to rounding mode primitive attribute for details.

--attr-deterministic

--attr-deterministic specifies the deterministic mode to be used for benchmarking. BOOL values can be true, which enables the deterministic mode and false (the default), which disables it. Refer to deterministic primitive attribute for details.

--attr-dropout

--attr-dropout defines the dropout attribute; right before the post-ops get applied, dropout fills a part of the output buffer with zeroes at random offsets (using a version of Philox as PRNG) and divides what's left by (1 - p), where p means probability, see below. PROBABILITY is a floating-point value between 0 and 1 that specifies how likely it is for any given output value to be zeroed (i.e. 'dropped out'): when 0 is specified the output buffer is to remain intact, when 1 is specified all output values are to be dropped out. SEED is the 32-bit integer seed of the RNG (a modified Philox algorithm adapted for GPUs), 0 by default. TAG specifies the memory format of the output buffer where the dropout mask will be stored. TAG values use the same notation as in drivers. The default value of TAG is any. Refer to tags for details.

--attr-scales

--attr-scales defines per memory argument primitive scales attribute. ARG specifies which memory argument will be modified. Supported values are:

src or src0 corresponds to DNNL_ARG_SRC.
src1 corresponds to DNNL_ARG_SRC_1.
wei corresponds to DNNL_ARG_WEIGHTS.
dst corresponds to DNNL_ARG_DST.
msrci corresponds to DNNL_ARG_MULTIPLE_SRC + i.

POLICY specifies the way scale values will be applied to an ARG tensor. By default no scales are applied which corresponds to common policy and 1.f scale value. Supported values are:

common corresponds to mask = 0 and means a whole tensor will be multiplied by a single SCALE value.
per_dim_0 corresponds to mask = 1 << 0 and means elements of dim0 will be multiplied by scale factors different for each point. Number of scale factors is equal to dims[0].
per_dim_1 corresponds to mask = 1 << 1 and means elements of dim1 will be multiplied by scale factors different for each point. Number of scale factors is equal to dims[1].
per_dim_01 corresponds to mask = (1 << 0) + (1 << 1) and means elements of dim0 and dim1 will be multiplied by scale factors different for a pair of {dim0, dim1} points. Number of scale factors is equal to dims[0] * dims[1].
per_oc same as per_dim_0 for non-grouped case of WEI argument, same as per_dim_01 for grouped case of WEI argument, same as per_dim_1 for arguments other than WEI.
per_ocic same as per_dim_01 for non-grouped case of WEI argument or for non-batched Matmul case of WEI argument, corresponds to mask = (1 << 1) + (1 << 2) for grouped case of WEI argument. corresponds to mask = (1 << batch_ndims) + (1 << batch_ndims + 1) for batched Matmul case of WEI argument.
per_dim_2 corresponds to mask = 1 << 2 and means elements of dim3 will be multiplied by scale factors different for each point. Number of scale factors is equal to dims[2]. Currently supported only in matmul primitive for 3D tensors.
per_dim_3 corresponds to mask = 1 << 3 and means elements of dim3 will be multiplied by scale factors different for each point. Number of scale factors is equal to dims[3]. Currently supported only in matmul primitive for 4D tensors.
per_tensor means each element of original tensor will be multiplied by a unique number. Number of scale factor is equal to nelems. As of now supported only by binary post-ops.

SCALE is required for the common policy only, and specifies a floating-point value which is passed for execution at runtime. Specifying a value for any other policies will trigger an error.

DATA_TYPE specifies data type for scale factors. Supported values are f32 (the default), f16, bf16.

GROUPS specifies how scales are grouped along dimensions where multiple scale factors are used.

To specify more than one memory argument for this attribute, + delimiter is used.

--attr-zero-points

--attr-zero-points defines zero points per memory argument primitive attribute. This attribute is supported only for integer data types as of now.

ARG specifies which memory argument will be modified. Supported values are:

src corresponds to DNNL_ARG_SRC
wei corresponds to DNNL_ARG_WEIGHTS
dst corresponds to DNNL_ARG_DST

POLICY has the same semantics and meaning as for --attr-scales. Supported values are:

common
per_dim_1 (for src and dst)

ZEROPOINT is required for the common policy only, an specifies an integer value which is passed for execution at runtime. Specifying a value for any other policies will trigger an error.

DATA_TYPE specifies data type for zero points. Supported values are s32 (the default), s8, u8, s4, u4.

GROUPS specifies how zero points are grouped along dimensions where multiple zero points factors are used.

To specify more than one memory argument for this attribute, + delimiter is used.

--attr-post-ops

--attr-post-ops defines post operations primitive attribute. Depending on post operations kind, the syntax differs. To specify more than one post operation, plus delimiter + is used.

Operations may be called in any order, e.g. apply SUM at first and then apply ELTWISE, or vice versa - apply ELTWISE and then SUM it with destination.

Sum

SUM post operation kind appends operation result to the output. It supports optional arguments SCALE parsed as a real number, which scales the output before appending the result, ZERO_POINT parsed as a integer, which shifts the output before using the scale and DATA_TYPE argument which defines sum data type parameter. If invalid or undef value of DATA_TYPE is specified, an error will be returned.

Eltwise

ELTWISE post operation kind applies one of supported element-wise algorithms to the operation result and then stores it. It supports optional arguments ALPHA and BETA parsed as real numbers. To specify BETA, ALPHA must be specified. SCALE has same notation and semantics as for SUM kind, but requires both ALPHA and BETA to be specified. SCALE is applicable only when output tensor has integer data type.

ELTWISE supported values are:

Eltwise operations that support no alpha or beta:
- abs
- exp
- exp_dst
- gelu_erf
- gelu_tanh
- log
- logistic
- logistic_dst
- mish
- round
- sqrt
- sqrt_dst
- square
- tanh
- tanh_dst
Eltwise operations that support only alpha:
- elu
- elu_dst
- relu
- relu_dst
- soft_relu
- swish
Eltwise operations that support both alpha and beta:
- clip
- clip_v2
- clip_v2_dst
- hardsigmoid
- hardswish
- linear
- pow

Depthwise convolution

DW:KkSsPp post operation kind appends depthwise convolution with kernel size of k, stride size of s, and left padding size of p. These kinds are only applicable for 1x1 2D-spatial main convolution operation. They support optional argument DST_DT, which defines destination tensor data type. Refer to data types for details.

Binary

BINARY post operation kind applies one of supported binary algorithms to the operation result and then stores it. It requires mandatory argument of DT specifying data type of second memory operand. It supports optional argument of MASK_INPUT giving a hint what are the dimensions for a second memory operand. MASK_INPUT may be provided in two ways:

As plain integer value which will be process directly.
As a POLICY value (see above). In case MASK_INPUT value affects more than one dimension (e.g. per_tensor POLICY input or integer 15 input), additional optional argument TAG is supported, positioned after MASK_INPUT, to specify physical memory format. TAG values use same notation as in drivers. The default value of TAG is any. Refer to tags for details.

BINARY supported values are:

add
div
eq
ge
gt
le
lt
max
min
mul
ne
sub

Prelu

PRELU post operation kind applies forward algorithm to the operations result and then stores it. Weights DT is always implicitly f32. It supports an optional argument of POLICY specifying the broadcast policy.

Examples:

Run a set of f32 forward convolutions without bias appending accumulation into destination and perform relu on the output with scale set to 0.5:

    ./benchdnn --conv --cfg=f32 --dir=FWD_D \
               --attr-post-ops=sum+relu:0.5 --batch=shapes_tails

Run a 1D-spatial reorder problem with s8 input data and u8 output data in four different physical memory layout combinations {ncw, ncw}, {ncw, nwc}, {nwc, ncw} and {nwc, nwc} applying source scale 2.5 for each source point:

    ./benchdnn --reorder --sdt=s8 --ddt=u8 \
               --stag=ncw,nwc --dtag=ncw,nwc \
               --attr-scales=src:common:2.5 2x8x8

Run a binary problem with s8 input data and u8 output data in nc layout applying scales to both inputs without any post operations:

    ./benchdnn --binary --sdt=u8:s8 --ddt=u8 --stag=nc:nc \
               --attr-scales=src:common:1.5+src1:common:2.5 \
               100x100:100x100

Run a 1x1 convolution fused with depthwise convolution where destination scales are set to 0.5 for 1x1 convolution and 1.5 for depthwise post-op followed by a relu post-op. The final dst datatype after the fusion in the example below is s8. The weights datatype is inferred as s8, f32 and bf16 for int8, f32 and bf16 convolutions respectively.

  ./benchdnn --conv --cfg=u8s8u8 --attr-scales=dst:per_oc \
             --attr-post-ops=relu+dw_k3s1p1:s8+relu \
             ic16oc16ih4oh4kh1ph0

Run a convolution problem with binary post operation, first one adds single int value to the destination memory, second one adds a tensor of same size with nhwc-like physical memory layout with float values to the destination memory:

  ./benchdnn --conv --attr-post-ops=add:s32:common,add:f32:per_tensor:axb \
             ic16oc16ih4oh4kh1ph0

Run a matmul problem with weights decompression where weights bf16 scales and s8 zero points have scales along ic and oc dimensions, have groups along ic dimension:

  ./benchdnn --matmul --dt=f32:s8:f32 --attr-fpmath=bf16:true \
             --attr-scales=wei:per_ocic:bf16:2x1 \
             --attr-zero-points=wei:per_ocic:s8:2x1 6x4:4x5

13 KiB Raw Blame History