Convolution/Deconvolution Driver

Usage

    ./benchdnn --conv [benchdnn-knobs] [conv-knobs] [conv-desc] ...
    ./benchdnn --deconv [benchdnn-knobs] [conv-knobs] [conv-desc] ...

where conv-knobs are:

--dir={FWD_B [default], FWD_D, FWD_I, BWD_D, BWD_W, BWD_WB} -- dnnl_prop_kind_t. Refer to direction for details.
--dt={f32:f32:f32 [default], ...} -- source, weights and destination data types. Interface supports broadcasting, when a single input is provided, e.g., --dt=f32, and the value will be applied for all tensors. Refer to data types for details.
--cfg={f32 [default], ...} -- Deprecated setting. Refer to Configurations below.
--stag={any [default], ...} -- physical src memory layout. Refer to tags for details.
--wtag={any [default], ...} -- physical wei memory layout. Refer to tags for details.
--dtag={any [default], ...} -- physical dst memory layout. Refer to tags for details.
--strides=SRC_STRIDES:WEI_STRIDES:DST_STRIDES -- physical memory layout specification for src, weights, and dst tensors through strides values. Refer to option documentation for details.
--bia-dt={undef [default], f32, bf16, f16, ...} -- bias data type. To run Inner Product without bias, use undef data type (default). --dir=FWD_B|BWD_WB will set --bia-dt to f32 to preserve compatibility with the former behavior. Refer to data types for details.
--alg={DIRECT [default], WINO, AUTO} -- convolution algorithm. WINO is Winograd-based convolution. AUTO will pick one of DIRECT or WINO automatically, library-based decision.
--mb=INT -- override minibatch size specified in the problem description. When set to 0, use minibatch size as defined by the individual problem descriptor. The default is 0.
--match=REGEX -- skip problems not matching the regular expression in REGEX. By default no pattern is applied (run everything). Note: Windows may interpret only string arguments surrounded by double quotation marks.
Any attributes options. Refer to attributes for details.

and conv-desc is a problem descriptor. The canonical form is:

    gXmbX_icXidXihXiwX_ocXodXohXowX_kdXkhXkwX_sdXshXswX_pdXphXpwX_ddXdhXdwX_nS

Refer to descriptor for details. Input shape and kernel size are mandatory inputs. Output shape and padding may be deduced based on the values provided.

Precision Configurations

--cfg option specifies what data types will be used for a problem. It also defines the data filling strategy. It is implicit for the integer type saturation. This option also defines the threshold for computation errors.

The table below shows supported name configurations for this driver:

For data type support, refer to data types and convolution primitive documentation.

src	wei	dst	acc	cfg
f32	f32	f32	f32	f32
f64	f64	f64	f64	f64
f32	f32	s8	f32	f32f32s8
u8	s8	f32	s32	u8s8f32
u8	s8	s32	s32	u8s8s32
u8	s8	s8	s32	u8s8s8
u8	s8	u8	s32	u8s8u8
s8	s8	f32	s32	s8s8f32
s8	s8	s32	s32	s8s8s32
s8	s8	s8	s32	s8s8s8
s8	s8	u8	s32	s8s8u8
f32	f32	f32	f32	f32_wino
f16	f16	f16	f32	f16
f16	f16	s8	f32	f16f16s8
bf16	bf16	bf16	f32	bf16bf16bf16
bf16	bf16	f32	f32	bf16bf16f32
bf16	f32	bf16	f32	bf16f32bf16
f32	bf16	bf16	f32	f32bf16bf16

Essence of Testing

Since convolution problems require a significant number of accumulators for a single output point, hitting an overflow or loss of precision issues is easy. To deal with that, the convolution driver applies two techniques to mitigate the above-mentioned issues: 1) uses integer values for activations and weights so that integers can be compared to integers without dealing with floating-point precision loss; 2) utilizes data density to control the output range of values so that final values remain in the range of float data type representation and no saturation happens for a lower precision integer output.

Examples

Run the set of f32 forward convolutions from inputs/conv/set_conv_all file w/ bias and default minibatch:

    ./benchdnn --conv --dt=f32 --dir=FWD_B --batch=inputs/conv/set_conv_all

Run the same but with post_ops ReLU:

    ./benchdnn --conv --dt=f32 --dir=FWD_B \
               --attr-post-ops=relu --batch=inputs/conv/set_conv_all

Run the same as previous but measures performance, not correctness check:

    ./benchdnn --conv --mode=p --dt=f32 --dir=FWD_B \
               --attr-post-ops=relu --batch=inputs/conv/set_conv_all

Run a set of f32 backward convolutions wrt weights with kh=3 and verbose level set to 2:

    ./benchdnn --conv -v2 --dt=f32 --dir=BWD_W \
               --match='.*kh3[^0-9].*' --batch=inputs/conv/set_conv_all

Run a set of u8s8u8 backward convolutions wrt data but skip all the convolutions that will use reference or gemm-based implementation:

    ./benchdnn --conv --dt=u8:s8:u8 --dir=BWD_D \
               --skip-impl=ref,x64:gemm --batch=inputs/conv/set_conv_all

Run explicitly specified first forward convolution (including bias) from Alexnet with the minibatch set to 4 and the verbose level set to 1 for two given configurations (u8:s8:u8 and f32):

    ./benchdnn --conv -v1 --mb=4 --dir=FWD_B --dt=f32,u8:s8:u8 \
               ic3ih227iw227_oc96oh55ow55_kh11kw11_sh4sw4ph0pw0_n"alexnet:conv1"

Run the batch file for different algorithms (assuming the file specifies only convolutions and does not include driver options that would override any passed on the command line). Also ignore dnnl_unimplemented errors in case of Winograd:

    ./benchdnn --conv --alg=DIRECT,WINO,AUTO --batch=inputs/conv/set_conv_all

Run a set of u8s8u8 forward convolutions without bias, skipping reference implementations with one common output scale set to 0.5:

    ./benchdnn --conv --dt=u8:s8:u8 --dir=FWD_D --skip-impl=ref \
               --attr-scales=dst:common:0.5 --batch=inputs/conv/set_conv_all

More examples with different driver options can be found at inputs/conv/test_* or inputs/conv/harness_*. Examples with different problem descriptors can be found at inputs/conv/shapes_*.

7.0 KiB Raw Blame History

Convolution/Deconvolution Driver

Usage

Precision Configurations

Essence of Testing

Examples

7.0 KiB

Raw Blame History