mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Summary: Working towards https://docs.google.com/document/d/10yx2-4gs0gTMOimVS403MnoAWkqitS8TUHX73PN8EjE/edit?pli=1# This PR: - Ensure that all the submodules are listed in a rst file (that ensure they are considered by the coverage tool) - Remove some long deprecated code that just error out on import - Remove the allow list altogether to ensure nothing gets added back there Pull Request resolved: https://github.com/pytorch/pytorch/pull/73983 Reviewed By: anjali411 Differential Revision: D34787908 Pulled By: albanD fbshipit-source-id: 163ce61e133b12b2f2e1cbe374f979e3d6858db7 (cherry picked from commit c9edfead7a01dc45bfc24eaf7220d2a84ab1f62e)
519 lines
12 KiB
ReStructuredText
519 lines
12 KiB
ReStructuredText
Quantization API Reference
|
|
-------------------------------
|
|
|
|
torch.quantization
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
This module contains Eager mode quantization APIs.
|
|
|
|
.. currentmodule:: torch.quantization
|
|
|
|
Top level APIs
|
|
^^^^^^^^^^^^^^
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
quantize
|
|
quantize_dynamic
|
|
quantize_qat
|
|
prepare
|
|
prepare_qat
|
|
convert
|
|
|
|
Preparing model for quantization
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
fuse_modules
|
|
QuantStub
|
|
DeQuantStub
|
|
QuantWrapper
|
|
add_quant_dequant
|
|
|
|
Utility functions
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
add_observer_
|
|
swap_module
|
|
propagate_qconfig_
|
|
default_eval_fn
|
|
get_observer_dict
|
|
|
|
torch.quantization.quantize_fx
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
This module contains FX graph mode quantization APIs (prototype).
|
|
|
|
.. currentmodule:: torch.quantization.quantize_fx
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
prepare_fx
|
|
prepare_qat_fx
|
|
convert_fx
|
|
fuse_fx
|
|
|
|
torch (quantization related functions)
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
This describes the quantization related functions of the `torch` namespace.
|
|
|
|
.. currentmodule:: torch
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
quantize_per_tensor
|
|
quantize_per_channel
|
|
dequantize
|
|
|
|
torch.Tensor (quantization related methods)
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Quantized Tensors support a limited subset of data manipulation methods of the
|
|
regular full-precision tensor.
|
|
|
|
.. currentmodule:: torch.Tensor
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
view
|
|
as_strided
|
|
expand
|
|
flatten
|
|
select
|
|
ne
|
|
eq
|
|
ge
|
|
le
|
|
gt
|
|
lt
|
|
copy_
|
|
clone
|
|
dequantize
|
|
equal
|
|
int_repr
|
|
max
|
|
mean
|
|
min
|
|
q_scale
|
|
q_zero_point
|
|
q_per_channel_scales
|
|
q_per_channel_zero_points
|
|
q_per_channel_axis
|
|
resize_
|
|
sort
|
|
topk
|
|
|
|
|
|
torch.quantization.observer
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
This module contains observers which are used to collect statistics about
|
|
the values observed during calibration (PTQ) or training (QAT).
|
|
|
|
.. currentmodule:: torch.quantization.observer
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
ObserverBase
|
|
MinMaxObserver
|
|
MovingAverageMinMaxObserver
|
|
PerChannelMinMaxObserver
|
|
MovingAveragePerChannelMinMaxObserver
|
|
HistogramObserver
|
|
PlaceholderObserver
|
|
RecordingObserver
|
|
NoopObserver
|
|
get_observer_state_dict
|
|
load_observer_state_dict
|
|
default_observer
|
|
default_placeholder_observer
|
|
default_debug_observer
|
|
default_weight_observer
|
|
default_histogram_observer
|
|
default_per_channel_weight_observer
|
|
default_dynamic_quant_observer
|
|
default_float_qparams_observer
|
|
|
|
torch.quantization.fake_quantize
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
This module implements modules which are used to perform fake quantization
|
|
during QAT.
|
|
|
|
.. currentmodule:: torch.quantization.fake_quantize
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
FakeQuantizeBase
|
|
FakeQuantize
|
|
FixedQParamsFakeQuantize
|
|
FusedMovingAvgObsFakeQuantize
|
|
default_fake_quant
|
|
default_weight_fake_quant
|
|
default_per_channel_weight_fake_quant
|
|
default_histogram_fake_quant
|
|
default_fused_act_fake_quant
|
|
default_fused_wt_fake_quant
|
|
default_fused_per_channel_wt_fake_quant
|
|
disable_fake_quant
|
|
enable_fake_quant
|
|
disable_observer
|
|
enable_observer
|
|
|
|
torch.quantization.qconfig
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
This module defines `QConfig` objects which are used
|
|
to configure quantization settings for individual ops.
|
|
|
|
.. currentmodule:: torch.quantization.qconfig
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
QConfig
|
|
default_qconfig
|
|
default_debug_qconfig
|
|
default_per_channel_qconfig
|
|
default_dynamic_qconfig
|
|
float16_dynamic_qconfig
|
|
float16_static_qconfig
|
|
per_channel_dynamic_qconfig
|
|
float_qparams_weight_only_qconfig
|
|
default_qat_qconfig
|
|
default_weight_only_qconfig
|
|
default_activation_only_qconfig
|
|
default_qat_qconfig_v2
|
|
|
|
torch.nn.intrinsic
|
|
~~~~~~~~~~~~~~~~~~
|
|
.. automodule:: torch.nn.intrinsic
|
|
.. automodule:: torch.nn.intrinsic.modules
|
|
|
|
This module implements the combined (fused) modules conv + relu which can
|
|
then be quantized.
|
|
|
|
.. currentmodule:: torch.nn.intrinsic
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
ConvReLU1d
|
|
ConvReLU2d
|
|
ConvReLU3d
|
|
LinearReLU
|
|
ConvBn1d
|
|
ConvBn2d
|
|
ConvBn3d
|
|
ConvBnReLU1d
|
|
ConvBnReLU2d
|
|
ConvBnReLU3d
|
|
BNReLU2d
|
|
BNReLU3d
|
|
|
|
torch.nn.intrinsic.qat
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
.. automodule:: torch.nn.intrinsic.qat
|
|
.. automodule:: torch.nn.intrinsic.qat.modules
|
|
|
|
|
|
This module implements the versions of those fused operations needed for
|
|
quantization aware training.
|
|
|
|
.. currentmodule:: torch.nn.intrinsic.qat
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
LinearReLU
|
|
ConvBn1d
|
|
ConvBnReLU1d
|
|
ConvBn2d
|
|
ConvBnReLU2d
|
|
ConvReLU2d
|
|
ConvBn3d
|
|
ConvBnReLU3d
|
|
ConvReLU3d
|
|
update_bn_stats
|
|
freeze_bn_stats
|
|
|
|
torch.nn.intrinsic.quantized
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
.. automodule:: torch.nn.intrinsic.quantized
|
|
.. automodule:: torch.nn.intrinsic.quantized.modules
|
|
|
|
|
|
This module implements the quantized implementations of fused operations
|
|
like conv + relu. No BatchNorm variants as it's usually folded into convolution
|
|
for inference.
|
|
|
|
.. currentmodule:: torch.nn.intrinsic.quantized
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
BNReLU2d
|
|
BNReLU3d
|
|
ConvReLU1d
|
|
ConvReLU2d
|
|
ConvReLU3d
|
|
LinearReLU
|
|
|
|
torch.nn.intrinsic.quantized.dynamic
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
.. automodule:: torch.nn.intrinsic.quantized.dynamic
|
|
.. automodule:: torch.nn.intrinsic.quantized.dynamic.modules
|
|
|
|
This module implements the quantized dynamic implementations of fused operations
|
|
like linear + relu.
|
|
|
|
.. currentmodule:: torch.nn.intrinsic.quantized.dynamic
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
LinearReLU
|
|
|
|
torch.nn.qat
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
.. automodule:: torch.nn.qat
|
|
.. automodule:: torch.nn.qat.modules
|
|
|
|
This module implements versions of the key nn modules **Conv2d()** and
|
|
**Linear()** which run in FP32 but with rounding applied to simulate the
|
|
effect of INT8 quantization.
|
|
|
|
.. currentmodule:: torch.nn.qat
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
Conv2d
|
|
Conv3d
|
|
Linear
|
|
|
|
torch.nn.qat.dynamic
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
.. automodule:: torch.nn.qat.dynamic
|
|
.. automodule:: torch.nn.qat.dynamic.modules
|
|
|
|
This module implements versions of the key nn modules such as **Linear()**
|
|
which run in FP32 but with rounding applied to simulate the effect of INT8
|
|
quantization and will be dynamically quantized during inference.
|
|
|
|
.. currentmodule:: torch.nn.qat.dynamic
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
Linear
|
|
|
|
torch.nn.quantized
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
.. automodule:: torch.nn.quantized
|
|
.. automodule:: torch.nn.quantized.modules
|
|
|
|
This module implements the quantized versions of the nn layers such as
|
|
~`torch.nn.Conv2d` and `torch.nn.ReLU`.
|
|
|
|
.. currentmodule:: torch.nn.quantized
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
ReLU6
|
|
Hardswish
|
|
ELU
|
|
LeakyReLU
|
|
Sigmoid
|
|
BatchNorm2d
|
|
BatchNorm3d
|
|
Conv1d
|
|
Conv2d
|
|
Conv3d
|
|
ConvTranspose1d
|
|
ConvTranspose2d
|
|
ConvTranspose3d
|
|
Embedding
|
|
EmbeddingBag
|
|
FloatFunctional
|
|
FXFloatFunctional
|
|
QFunctional
|
|
Linear
|
|
LayerNorm
|
|
GroupNorm
|
|
InstanceNorm1d
|
|
InstanceNorm2d
|
|
InstanceNorm3d
|
|
|
|
torch.nn.quantized.functional
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
.. automodule:: torch.nn.quantized.functional
|
|
|
|
This module implements the quantized versions of the functional layers such as
|
|
~`torch.nn.functional.conv2d` and `torch.nn.functional.relu`. Note:
|
|
:meth:`~torch.nn.functional.relu` supports quantized inputs.
|
|
|
|
.. currentmodule:: torch.nn.quantized.functional
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
avg_pool2d
|
|
avg_pool3d
|
|
adaptive_avg_pool2d
|
|
adaptive_avg_pool3d
|
|
conv1d
|
|
conv2d
|
|
conv3d
|
|
interpolate
|
|
linear
|
|
max_pool1d
|
|
max_pool2d
|
|
celu
|
|
leaky_relu
|
|
hardtanh
|
|
hardswish
|
|
threshold
|
|
elu
|
|
hardsigmoid
|
|
clamp
|
|
upsample
|
|
upsample_bilinear
|
|
upsample_nearest
|
|
|
|
torch.nn.quantized.dynamic
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
.. automodule:: torch.nn.quantized.dynamic
|
|
.. automodule:: torch.nn.quantized.dynamic.modules
|
|
|
|
Dynamically quantized :class:`~torch.nn.Linear`, :class:`~torch.nn.LSTM`,
|
|
:class:`~torch.nn.LSTMCell`, :class:`~torch.nn.GRUCell`, and
|
|
:class:`~torch.nn.RNNCell`.
|
|
|
|
.. currentmodule:: torch.nn.quantized.dynamic
|
|
|
|
.. autosummary::
|
|
:toctree: generated
|
|
:nosignatures:
|
|
:template: classtemplate.rst
|
|
|
|
Linear
|
|
LSTM
|
|
GRU
|
|
RNNCell
|
|
LSTMCell
|
|
GRUCell
|
|
|
|
Quantized dtypes and quantization schemes
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Note that operator implementations currently only
|
|
support per channel quantization for weights of the **conv** and **linear**
|
|
operators. Furthermore, the input data is
|
|
mapped linearly to the the quantized data and vice versa
|
|
as follows:
|
|
|
|
.. math::
|
|
|
|
\begin{aligned}
|
|
\text{Quantization:}&\\
|
|
&Q_\text{out} = \text{clamp}(x_\text{input}/s+z, Q_\text{min}, Q_\text{max})\\
|
|
\text{Dequantization:}&\\
|
|
&x_\text{out} = (Q_\text{input}-z)*s
|
|
\end{aligned}
|
|
|
|
where :math:`\text{clamp}(.)` is the same as :func:`~torch.clamp` while the
|
|
scale :math:`s` and zero point :math:`z` are then computed
|
|
as decribed in :class:`~torch.ao.quantization.observer.MinMaxObserver`, specifically:
|
|
|
|
.. math::
|
|
|
|
\begin{aligned}
|
|
\text{if Symmetric:}&\\
|
|
&s = 2 \max(|x_\text{min}|, x_\text{max}) /
|
|
\left( Q_\text{max} - Q_\text{min} \right) \\
|
|
&z = \begin{cases}
|
|
0 & \text{if dtype is qint8} \\
|
|
128 & \text{otherwise}
|
|
\end{cases}\\
|
|
\text{Otherwise:}&\\
|
|
&s = \left( x_\text{max} - x_\text{min} \right ) /
|
|
\left( Q_\text{max} - Q_\text{min} \right ) \\
|
|
&z = Q_\text{min} - \text{round}(x_\text{min} / s)
|
|
\end{aligned}
|
|
|
|
where :math:`[x_\text{min}, x_\text{max}]` denotes the range of the input data while
|
|
:math:`Q_\text{min}` and :math:`Q_\text{max}` are respectively the minimum and maximum values of the quantized dtype.
|
|
|
|
Note that the choice of :math:`s` and :math:`z` implies that zero is represented with no quantization error whenever zero is within
|
|
the range of the input data or symmetric quantization is being used.
|
|
|
|
Additional data types and quantization schemes can be implemented through
|
|
the `custom operator mechanism <https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html>`_.
|
|
|
|
* :attr:`torch.qscheme` — Type to describe the quantization scheme of a tensor.
|
|
Supported types:
|
|
|
|
* :attr:`torch.per_tensor_affine` — per tensor, asymmetric
|
|
* :attr:`torch.per_channel_affine` — per channel, asymmetric
|
|
* :attr:`torch.per_tensor_symmetric` — per tensor, symmetric
|
|
* :attr:`torch.per_channel_symmetric` — per channel, symmetric
|
|
|
|
* ``torch.dtype`` — Type to describe the data. Supported types:
|
|
|
|
* :attr:`torch.quint8` — 8-bit unsigned integer
|
|
* :attr:`torch.qint8` — 8-bit signed integer
|
|
* :attr:`torch.qint32` — 32-bit signed integer
|
|
|
|
|
|
.. These modules are missing docs. Adding them here only for tracking
|
|
.. automodule:: torch.nn.quantizable
|
|
.. automodule:: torch.nn.quantizable.modules
|