7.0 KiB
Convolution/Deconvolution Driver
Usage
./benchdnn --conv [benchdnn-knobs] [conv-knobs] [conv-desc] ...
./benchdnn --deconv [benchdnn-knobs] [conv-knobs] [conv-desc] ...
where conv-knobs are:
--dir={FWD_B [default], FWD_D, FWD_I, BWD_D, BWD_W, BWD_WB}
-- dnnl_prop_kind_t. Refer to direction for details.--dt={f32:f32:f32 [default], ...}
-- source, weights and destination data types. Interface supports broadcasting, when a single input is provided, e.g.,--dt=f32
, and the value will be applied for all tensors. Refer to data types for details.--cfg={f32 [default], ...}
-- Deprecated setting. Refer toConfigurations
below.--stag={any [default], ...}
-- physical src memory layout. Refer to tags for details.--wtag={any [default], ...}
-- physical wei memory layout. Refer to tags for details.--dtag={any [default], ...}
-- physical dst memory layout. Refer to tags for details.--strides=SRC_STRIDES:WEI_STRIDES:DST_STRIDES
-- physical memory layout specification forsrc
,weights
, anddst
tensors through strides values. Refer to option documentation for details.--bia-dt={undef [default], f32, bf16, f16, ...}
-- bias data type. To run Inner Product without bias, useundef
data type (default).--dir=FWD_B|BWD_WB
will set--bia-dt
tof32
to preserve compatibility with the former behavior. Refer to data types for details.--alg={DIRECT [default], WINO, AUTO}
-- convolution algorithm.WINO
is Winograd-based convolution.AUTO
will pick one ofDIRECT
orWINO
automatically, library-based decision.--mb=INT
-- override minibatch size specified in the problem description. When set to0
, use minibatch size as defined by the individual problem descriptor. The default is0
.--match=REGEX
-- skip problems not matching the regular expression inREGEX
. By default no pattern is applied (run everything). Note: Windows may interpret only string arguments surrounded by double quotation marks.- Any attributes options. Refer to attributes for details.
and conv-desc is a problem descriptor. The canonical form is:
gXmbX_icXidXihXiwX_ocXodXohXowX_kdXkhXkwX_sdXshXswX_pdXphXpwX_ddXdhXdwX_nS
Refer to descriptor for details. Input shape and kernel size are mandatory inputs. Output shape and padding may be deduced based on the values provided.
Precision Configurations
--cfg
option specifies what data types will be used for a
problem. It also defines the data filling strategy. It is implicit for the
integer type saturation. This option also defines the threshold for computation
errors.
The table below shows supported name configurations for this driver:
For data type support, refer to data types and convolution primitive documentation.
src | wei | dst | acc | cfg |
---|---|---|---|---|
f32 | f32 | f32 | f32 | f32 |
f64 | f64 | f64 | f64 | f64 |
f32 | f32 | s8 | f32 | f32f32s8 |
u8 | s8 | f32 | s32 | u8s8f32 |
u8 | s8 | s32 | s32 | u8s8s32 |
u8 | s8 | s8 | s32 | u8s8s8 |
u8 | s8 | u8 | s32 | u8s8u8 |
s8 | s8 | f32 | s32 | s8s8f32 |
s8 | s8 | s32 | s32 | s8s8s32 |
s8 | s8 | s8 | s32 | s8s8s8 |
s8 | s8 | u8 | s32 | s8s8u8 |
f32 | f32 | f32 | f32 | f32_wino |
f16 | f16 | f16 | f32 | f16 |
f16 | f16 | s8 | f32 | f16f16s8 |
bf16 | bf16 | bf16 | f32 | bf16bf16bf16 |
bf16 | bf16 | f32 | f32 | bf16bf16f32 |
bf16 | f32 | bf16 | f32 | bf16f32bf16 |
f32 | bf16 | bf16 | f32 | f32bf16bf16 |
Essence of Testing
Since convolution problems require a significant number of accumulators for a single output point, hitting an overflow or loss of precision issues is easy. To deal with that, the convolution driver applies two techniques to mitigate the above-mentioned issues: 1) uses integer values for activations and weights so that integers can be compared to integers without dealing with floating-point precision loss; 2) utilizes data density to control the output range of values so that final values remain in the range of float data type representation and no saturation happens for a lower precision integer output.
Examples
Run the set of f32 forward convolutions from inputs/conv/set_conv_all file w/ bias and default minibatch:
./benchdnn --conv --dt=f32 --dir=FWD_B --batch=inputs/conv/set_conv_all
Run the same but with post_ops ReLU:
./benchdnn --conv --dt=f32 --dir=FWD_B \
--attr-post-ops=relu --batch=inputs/conv/set_conv_all
Run the same as previous but measures performance, not correctness check:
./benchdnn --conv --mode=p --dt=f32 --dir=FWD_B \
--attr-post-ops=relu --batch=inputs/conv/set_conv_all
Run a set of f32 backward convolutions wrt weights with kh=3 and verbose level set to 2:
./benchdnn --conv -v2 --dt=f32 --dir=BWD_W \
--match='.*kh3[^0-9].*' --batch=inputs/conv/set_conv_all
Run a set of u8s8u8 backward convolutions wrt data but skip all the convolutions that will use reference or gemm-based implementation:
./benchdnn --conv --dt=u8:s8:u8 --dir=BWD_D \
--skip-impl=ref,x64:gemm --batch=inputs/conv/set_conv_all
Run explicitly specified first forward convolution (including bias) from Alexnet
with the minibatch set to 4 and the verbose level set to 1 for two given
configurations (u8:s8:u8
and f32
):
./benchdnn --conv -v1 --mb=4 --dir=FWD_B --dt=f32,u8:s8:u8 \
ic3ih227iw227_oc96oh55ow55_kh11kw11_sh4sw4ph0pw0_n"alexnet:conv1"
Run the batch file for different algorithms (assuming the file specifies only convolutions and does not include driver options that would override any passed on the command line). Also ignore dnnl_unimplemented errors in case of Winograd:
./benchdnn --conv --alg=DIRECT,WINO,AUTO --batch=inputs/conv/set_conv_all
Run a set of u8s8u8 forward convolutions without bias, skipping reference implementations with one common output scale set to 0.5:
./benchdnn --conv --dt=u8:s8:u8 --dir=FWD_D --skip-impl=ref \
--attr-scales=dst:common:0.5 --batch=inputs/conv/set_conv_all
More examples with different driver options can be found at inputs/conv/test_* or inputs/conv/harness_*. Examples with different problem descriptors can be found at inputs/conv/shapes_*.