13 KiB
Attributes
Usage
--attr-scratchpad=MODE
--attr-fpmath=MATHMODE[:APPLY_TO_INT]
--attr-acc-mode=ACCMODE
--attr-rounding-mode=ARG:MODE[+...]
--attr-deterministic=BOOL
--attr-dropout=PROBABILITY[:SEED[:TAG]]
--attr-scales=ARG:POLICY[:SCALE[:DATA_TYPE[:GROUPS]]][+...]
--attr-zero-points=ARG:POLICY[:ZEROPOINT[:DATA_TYPE[:GROUPS]]][+...]
--attr-post-ops=SUM[:SCALE[:ZERO_POINT[:DATA_TYPE]]]
ELTWISE[:ALPHA[:BETA[:SCALE]]]
DW:KkSsPp[:DST_DT]
BINARY:DT[:MASK_INPUT[:TAG]]
PRELU[:POLICY]
--attr-scratchpad
--attr-scratchpad
specifies the scratchpad mode to be used for benchmarking.
MODE
values can be library
(the default) or user
. Refer to
scratchpad primitive attribute
for details.
--attr-fpmath
--attr-fpmath
specifies the fpmath mode to be used for benchmarking.
MATHMODE
values can be any of strict
(the default), bf16
, f16
, tf32
,
or any
.
APPLY_TO_INT
values can be either true
(the default) or false
.
Refer to
fpmath primitive attribute
for details.
--attr-acc-mode
--attr-acc-mode
specifies the accumulation mode to be used for benchmarking.
ACCMODE
values can be any of strict
(the default), relaxed
, any
, f32
,
s32
or f16
. Refer to
accumulation mode primitive attribute
for details.
--attr-rounding-mode
--attr-rounding-mode
specifies the rounding mode to be used for benchmarking.
ARG
specifies which memory argument will be modified. Supported values are:
dst
corresponds toDNNL_ARG_DST
.diff_src
corresponds toDNNL_ARG_DIFF_SRC
.diff_weights
corresponds toDNNL_ARG_DIFF_WEIGHTS
.MODE
specifies which mode to apply to the corresponding memory argument. Supported values are:environment
(default) andstochastic
. Refer to rounding mode primitive attribute for details.
--attr-deterministic
--attr-deterministic
specifies the deterministic mode to be used for
benchmarking. BOOL
values can be true
, which enables the deterministic
mode and false
(the default), which disables it. Refer to
deterministic primitive attribute
for details.
--attr-dropout
--attr-dropout
defines the dropout attribute; right before the post-ops get
applied, dropout fills a part of the output buffer with zeroes at random offsets
(using a version of Philox as PRNG) and divides what's left by (1 - p), where p
means probability, see below.
PROBABILITY
is a floating-point value between 0
and 1
that specifies how
likely it is for any given output value to be zeroed (i.e. 'dropped out'): when
0
is specified the output buffer is to remain intact, when 1
is specified
all output values are to be dropped out.
SEED
is the 32-bit integer seed of the RNG (a modified Philox algorithm
adapted for GPUs), 0 by default.
TAG
specifies the memory format of the output buffer where the dropout mask
will be stored. TAG
values use the same notation as in drivers. The default
value of TAG
is any
. Refer to tags for details.
--attr-scales
--attr-scales
defines per memory argument primitive scales attribute.
ARG
specifies which memory argument will be modified. Supported values are:
src
orsrc0
corresponds toDNNL_ARG_SRC
.src1
corresponds toDNNL_ARG_SRC_1
.wei
corresponds toDNNL_ARG_WEIGHTS
.dst
corresponds toDNNL_ARG_DST
.msrci
corresponds toDNNL_ARG_MULTIPLE_SRC + i
.
POLICY
specifies the way scale values will be applied to an ARG
tensor. By
default no scales are applied which corresponds to common
policy and 1.f
scale value. Supported values are:
common
corresponds tomask = 0
and means a whole tensor will be multiplied by a singleSCALE
value.per_dim_0
corresponds tomask = 1 << 0
and means elements of dim0 will be multiplied by scale factors different for each point. Number of scale factors is equal to dims[0].per_dim_1
corresponds tomask = 1 << 1
and means elements of dim1 will be multiplied by scale factors different for each point. Number of scale factors is equal to dims[1].per_dim_01
corresponds tomask = (1 << 0) + (1 << 1)
and means elements of dim0 and dim1 will be multiplied by scale factors different for a pair of {dim0, dim1} points. Number of scale factors is equal to dims[0] * dims[1].per_oc
same asper_dim_0
for non-grouped case ofWEI
argument, same asper_dim_01
for grouped case ofWEI
argument, same asper_dim_1
for arguments other thanWEI
.per_ocic
same asper_dim_01
for non-grouped case ofWEI
argument or for non-batched Matmul case ofWEI
argument, corresponds tomask = (1 << 1) + (1 << 2)
for grouped case ofWEI
argument. corresponds tomask = (1 << batch_ndims) + (1 << batch_ndims + 1)
for batched Matmul case ofWEI
argument.per_dim_2
corresponds tomask = 1 << 2
and means elements of dim3 will be multiplied by scale factors different for each point. Number of scale factors is equal to dims[2]. Currently supported only in matmul primitive for 3D tensors.per_dim_3
corresponds tomask = 1 << 3
and means elements of dim3 will be multiplied by scale factors different for each point. Number of scale factors is equal to dims[3]. Currently supported only in matmul primitive for 4D tensors.per_tensor
means each element of original tensor will be multiplied by a unique number. Number of scale factor is equal tonelems
. As of now supported only by binary post-ops.
SCALE
is required for the common
policy only, and specifies a floating-point
value which is passed for execution at runtime. Specifying a value for any other
policies will trigger an error.
DATA_TYPE
specifies data type for scale factors. Supported values are
f32
(the default), f16
, bf16
.
GROUPS
specifies how scales are grouped along dimensions where multiple scale
factors are used.
To specify more than one memory argument for this attribute, +
delimiter is
used.
--attr-zero-points
--attr-zero-points
defines zero points per memory argument primitive
attribute. This attribute is supported only for integer data types as of now.
ARG
specifies which memory argument will be modified. Supported values are:
src
corresponds toDNNL_ARG_SRC
wei
corresponds toDNNL_ARG_WEIGHTS
dst
corresponds toDNNL_ARG_DST
POLICY
has the same semantics and meaning as for --attr-scales
. Supported
values are:
common
per_dim_1
(forsrc
anddst
)
ZEROPOINT
is required for the common
policy only, an specifies an integer
value which is passed for execution at runtime. Specifying a value for any other
policies will trigger an error.
DATA_TYPE
specifies data type for zero points. Supported values are
s32
(the default), s8
, u8
, s4
, u4
.
GROUPS
specifies how zero points are grouped along dimensions where multiple
zero points factors are used.
To specify more than one memory argument for this attribute, +
delimiter is
used.
--attr-post-ops
--attr-post-ops
defines post operations primitive attribute. Depending on
post operations kind, the syntax differs. To specify more than one post
operation, plus delimiter +
is used.
Operations may be called in any order, e.g. apply SUM
at first and then apply
ELTWISE
, or vice versa - apply ELTWISE
and then SUM
it with destination.
Sum
SUM
post operation kind appends operation result to the output. It supports
optional arguments SCALE
parsed as a real number, which scales the output
before appending the result, ZERO_POINT
parsed as a integer, which shifts the
output before using the scale and DATA_TYPE
argument which defines sum data
type parameter. If invalid or undef
value of DATA_TYPE
is specified, an
error will be returned.
Eltwise
ELTWISE
post operation kind applies one of supported element-wise algorithms
to the operation result and then stores it. It supports optional arguments
ALPHA
and BETA
parsed as real numbers. To specify BETA
, ALPHA
must be
specified. SCALE
has same notation and semantics as for SUM
kind, but
requires both ALPHA
and BETA
to be specified. SCALE
is applicable only
when output tensor has integer data type.
ELTWISE
supported values are:
- Eltwise operations that support no alpha or beta:
abs
exp
exp_dst
gelu_erf
gelu_tanh
log
logistic
logistic_dst
mish
round
sqrt
sqrt_dst
square
tanh
tanh_dst
- Eltwise operations that support only alpha:
elu
elu_dst
relu
relu_dst
soft_relu
swish
- Eltwise operations that support both alpha and beta:
clip
clip_v2
clip_v2_dst
hardsigmoid
hardswish
linear
pow
Depthwise convolution
DW:KkSsPp
post operation kind appends depthwise convolution with kernel size
of k
, stride size of s
, and left padding size of p
.
These kinds are only applicable for 1x1 2D-spatial main convolution operation.
They support optional argument DST_DT
, which defines destination
tensor data type. Refer to data types for details.
Binary
BINARY
post operation kind applies one of supported binary algorithms to the
operation result and then stores it. It requires mandatory argument of DT
specifying data type of second memory operand. It supports optional argument of
MASK_INPUT
giving a hint what are the dimensions for a second memory operand.
MASK_INPUT
may be provided in two ways:
- As plain integer value which will be process directly.
- As a
POLICY
value (see above). In caseMASK_INPUT
value affects more than one dimension (e.g.per_tensor
POLICY
input or integer15
input), additional optional argumentTAG
is supported, positioned afterMASK_INPUT
, to specify physical memory format.TAG
values use same notation as in drivers. The default value ofTAG
isany
. Refer to tags for details.
BINARY
supported values are:
add
div
eq
ge
gt
le
lt
max
min
mul
ne
sub
Prelu
PRELU
post operation kind applies forward algorithm to the operations result
and then stores it. Weights DT
is always implicitly f32. It supports an
optional argument of POLICY
specifying the broadcast policy.
Examples:
Run a set of f32 forward convolutions without bias appending accumulation into destination and perform relu on the output with scale set to 0.5:
./benchdnn --conv --cfg=f32 --dir=FWD_D \
--attr-post-ops=sum+relu:0.5 --batch=shapes_tails
Run a 1D-spatial reorder problem with s8 input data and u8 output data in four different physical memory layout combinations {ncw, ncw}, {ncw, nwc}, {nwc, ncw} and {nwc, nwc} applying source scale 2.5 for each source point:
./benchdnn --reorder --sdt=s8 --ddt=u8 \
--stag=ncw,nwc --dtag=ncw,nwc \
--attr-scales=src:common:2.5 2x8x8
Run a binary problem with s8 input data and u8 output data in nc layout applying scales to both inputs without any post operations:
./benchdnn --binary --sdt=u8:s8 --ddt=u8 --stag=nc:nc \
--attr-scales=src:common:1.5+src1:common:2.5 \
100x100:100x100
Run a 1x1 convolution fused with depthwise convolution where destination scales
are set to 0.5 for 1x1 convolution and 1.5 for depthwise post-op followed by a
relu post-op. The final dst datatype after the fusion in the example below is
s8
. The weights datatype is inferred as s8
, f32
and bf16
for int8, f32
and bf16 convolutions respectively.
./benchdnn --conv --cfg=u8s8u8 --attr-scales=dst:per_oc \
--attr-post-ops=relu+dw_k3s1p1:s8+relu \
ic16oc16ih4oh4kh1ph0
Run a convolution problem with binary post operation, first one adds single int
value to the destination memory, second one adds a tensor of same size with
nhwc
-like physical memory layout with float values to the destination memory:
./benchdnn --conv --attr-post-ops=add:s32:common,add:f32:per_tensor:axb \
ic16oc16ih4oh4kh1ph0
Run a matmul problem with weights decompression where weights bf16
scales and
s8
zero points have scales along ic
and oc
dimensions, have groups along
ic
dimension:
./benchdnn --matmul --dt=f32:s8:f32 --attr-fpmath=bf16:true \
--attr-scales=wei:per_ocic:bf16:2x1 \
--attr-zero-points=wei:per_ocic:s8:2x1 6x4:4x5