Remove pytorch quant docs since we are moving to torchao (#157766)

Summary:
att

Test Plan:
doc page generated from CI

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157766
Approved by: https://github.com/Skylion007
This commit is contained in:
Jerry Zhang
2025-07-10 10:07:39 -07:00
committed by PyTorch MergeBot
parent dd93883231
commit 11a86ad2fa
9 changed files with 21 additions and 1600 deletions

View File

@ -15,7 +15,6 @@ help:
figures:
@$(PYCMD) source/scripts/build_activation_images.py
@$(PYCMD) source/scripts/build_quantization_configs.py
@$(PYCMD) source/scripts/build_lr_scheduler_images.py
onnx:

View File

@ -1,96 +0,0 @@
# Quantization Accuracy Debugging
This document provides high level strategies for improving quantization
accuracy. If a quantized model has error compared to the original model,
we can categorize the error into:
1. **data insensitive error** - caused by intrinsic model quantization error,
large portion of input data has large error
2. **data sensitive error** - caused by outlier input data, small
portion of input data has large error
3. **implementation error** - quantized kernel is not matching reference implementation
## Data insensitive error
### General tips
1. For PTQ, ensure that the data you are calibrating with is representative
of your dataset. For example, for a classification problem a general
guideline is to have multiple samples in every category, and the overall
number of samples should be at least 100. There is no penalty for
calibrating with more data other than calibration time.
2. If your model has Conv-BN or Linear-BN patterns, consider fusing them.
If you are using FX graph mode quantization, this is done automatically
by the workflow. If you are using Eager mode quantization, you can do
this manually with the ``torch.ao.quantization.fuse_modules`` API.
3. Increase the precision of dtype of the problematic ops. Usually, fp32
will have the highest accuracy, followed by fp16, followed by dynamically
quantized int8, followed by statically quantized int8.
1. Note: this is trading off performance for accuracy.
2. Note: availability of kernels per dtype per op can vary by backend.
3. Note: dtype conversions add an additional performance cost. For example,
``fp32_op -> quant -> int8_op -> dequant -> fp32_op -> quant -> int8_op -> dequant``
will have a performance penalty compared to
``fp32_op -> fp32_op -> quant -> int8_op -> int8_op -> dequant``
because of a higher number of required dtype conversions.
4. If you are using PTQ, consider using QAT to recover some of the accuracy loss
from quantization.
### Int8 quantization tips
1. If you are using per-tensor weight quantization, consider using per-channel
weight quantization.
2. If you are doing inference on `fbgemm`, ensure that you set the `reduce_range`
argument to `False` if your CPU is Cooperlake or newer, and to `True` otherwise.
3. Audit the input activation distribution variation across different samples.
If this variation is high, the layer may be suitable for dynamic quantization
but not static quantization.
## Data sensitive error
If you are using static quantization and a small portion of your input data is
resulting in high quantization error, you can try:
1. Adjust your calibration dataset to make it more representative of your
inference dataset.
2. Manually inspect (using Numeric Suite) which layers have high quantization
error. For these layers, consider leaving them in floating point or adjusting
the observer settings to choose a better scale and zero_point.
## Implementation error
If you are using PyTorch quantization with your own backend
you may see differences between the reference implementation of an
operation (such as ``dequant -> op_fp32 -> quant``) and the quantized implementation
(such as `op_int8`) of the op on the target hardware. This could mean one of two things:
1. the differences (usually small) are expected due to specific behavior of
the target kernel on the target hardware compared to fp32/cpu. An example of this
is accumulating in an integer dtype. Unless the kernel guarantees bitwise
equivalency with the reference implementation, this is expected.
2. the kernel on the target hardware has an accuracy issue. In this case, reach
out to the kernel developer.
## Numerical Debugging Tooling (prototype)
```{eval-rst}
.. toctree::
:hidden:
torch.ao.ns._numeric_suite
torch.ao.ns._numeric_suite_fx
```
```{warning}
Numerical debugging tooling is early prototype and subject to change.
```
```{eval-rst}
* :ref:`torch_ao_ns_numeric_suite`
Eager mode numeric suite
* :ref:`torch_ao_ns_numeric_suite_fx`
FX numeric suite
```

View File

@ -1,19 +0,0 @@
# Quantization Backend Configuration
FX Graph Mode Quantization allows the user to configure various
quantization behaviors of an op in order to match the expectation
of their backend.
In the future, this document will contain a detailed spec of
these configurations.
## Default values for native configurations
Below is the output of the configuration for quantization of ops
in x86 and qnnpack (PyTorch's default quantized backends).
Results:
```{eval-rst}
.. literalinclude:: scripts/quantization_backend_configs/default_backend_config.txt
```

File diff suppressed because it is too large Load Diff

View File

@ -1,64 +0,0 @@
"""
This script will generate default values of quantization configs.
These are for use in the documentation.
"""
import os.path
import torch
from torch.ao.quantization.backend_config import get_native_backend_config_dict
from torch.ao.quantization.backend_config.utils import (
entry_to_pretty_str,
remove_boolean_dispatch_from_name,
)
# Create a directory for the images, if it doesn't exist
QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH = os.path.join(
os.path.realpath(os.path.dirname(__file__)), "quantization_backend_configs"
)
if not os.path.exists(QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH):
os.mkdir(QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH)
output_path = os.path.join(
QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH, "default_backend_config.txt"
)
with open(output_path, "w") as f:
native_backend_config_dict = get_native_backend_config_dict()
configs = native_backend_config_dict["configs"]
def _sort_key_func(entry):
pattern = entry["pattern"]
while isinstance(pattern, tuple):
pattern = pattern[-1]
pattern = remove_boolean_dispatch_from_name(pattern)
if not isinstance(pattern, str):
# methods are already strings
pattern = torch.typename(pattern)
# we want
#
# torch.nn.modules.pooling.AdaptiveAvgPool1d
#
# and
#
# torch._VariableFunctionsClass.adaptive_avg_pool1d
#
# to be next to each other, so convert to all lower case
# and remove the underscores, and compare the last part
# of the string
pattern_str_normalized = pattern.lower().replace("_", "")
key = pattern_str_normalized.split(".")[-1]
return key
configs.sort(key=_sort_key_func)
entries = []
for entry in configs:
entries.append(entry_to_pretty_str(entry))
entries = ",\n".join(entries)
f.write(entries)

View File

@ -1,16 +0,0 @@
(torch_ao_ns_numeric_suite)=
# torch.ao.ns._numeric_suite
```{warning}
This module is an early prototype and is subject to change.
```
```{eval-rst}
.. currentmodule:: torch.ao.ns._numeric_suite
```
```{eval-rst}
.. automodule:: torch.ao.ns._numeric_suite
:members:
:member-order: bysource
```

View File

@ -1,39 +0,0 @@
(torch_ao_ns_numeric_suite_fx)=
# torch.ao.ns._numeric_suite_fx
```{warning}
This module is an early prototype and is subject to change.
```
```{eval-rst}
.. automodule:: torch.ao.ns._numeric_suite_fx
:members:
:member-order: bysource
```
---
# torch.ao.ns.fx.utils
```{warning}
This module is an early prototype and is subject to change.
```
```{eval-rst}
.. currentmodule:: torch.ao.ns.fx.utils
```
```{eval-rst}
.. function:: compute_sqnr(x, y)
```
```{eval-rst}
.. function:: compute_normalized_l2_error(x, y)
```
```{eval-rst}
.. function:: compute_cosine_similarity(x, y)
```

View File

@ -1,146 +0,0 @@
# Owner(s): ["oncall: quantization"]
import re
import contextlib
from pathlib import Path
import torch
from torch.testing._internal.common_quantization import (
QuantizationTestCase,
SingleLayerLinearModel,
)
from torch.testing._internal.common_quantized import override_quantized_engine
from torch.testing._internal.common_utils import raise_on_run_directly, IS_ARM64, IS_FBCODE
import unittest
@unittest.skipIf(IS_FBCODE, "some path issues in fbcode")
class TestQuantizationDocs(QuantizationTestCase):
r"""
The tests in this section import code from the quantization docs and check that
they actually run without errors. In cases where objects are undefined in the code snippet,
they must be provided in the test. The imports seem to behave a bit inconsistently,
they can be imported either in the test file or passed as a global input
"""
def run(self, result=None):
with override_quantized_engine("qnnpack") if IS_ARM64 else contextlib.nullcontext():
super().run(result)
def _get_code(
self, path_from_pytorch, unique_identifier, offset=2, short_snippet=False
):
r"""
This function reads in the code from the docs given a unique identifier.
Most code snippets have a 2 space indentation, for other indentation levels,
change the offset `arg`. the `short_snippet` arg can be set to allow for testing
of smaller snippets, the check that this arg controls is used to make sure that
we are not accidentally only importing a blank line or something.
"""
def get_correct_path(path_from_pytorch):
r"""
Current working directory when CI is running test seems to vary, this function
looks for docs relative to this test file.
"""
core_dir = Path(__file__).parent
assert core_dir.match("test/quantization/core/"), (
"test_docs.py is in an unexpected location. If you've been "
"moving files around, ensure that the test and build files have "
"been updated to have the correct relative path between "
"test_docs.py and the docs."
)
pytorch_root = core_dir.parents[2]
return pytorch_root / path_from_pytorch
path_to_file = get_correct_path(path_from_pytorch)
if path_to_file:
with open(path_to_file) as file:
content = file.readlines()
# it will register as having a newline at the end in python
if "\n" not in unique_identifier:
unique_identifier += "\n"
assert unique_identifier in content, f"could not find {unique_identifier} in {path_to_file}"
# get index of first line of code
line_num_start = content.index(unique_identifier) + 1
# next find where the code chunk ends.
# this regex will match lines that don't start
# with a \n or " " with number of spaces=offset
r = r = re.compile("^[^\n," + " " * offset + "]")
# this will return the line of first line that matches regex
line_after_code = next(filter(r.match, content[line_num_start:]))
last_line_num = content.index(line_after_code)
# remove the first `offset` chars of each line and gather it all together
code = "".join(
[x[offset:] for x in content[line_num_start + 1 : last_line_num]]
)
# want to make sure we are actually getting some code,
assert last_line_num - line_num_start > 3 or short_snippet, (
f"The code in {path_to_file} identified by {unique_identifier} seems suspiciously short:"
f"\n\n###code-start####\n{code}###code-end####"
)
return code
return None
def _test_code(self, code, global_inputs=None):
r"""
This function runs `code` using any vars in `global_inputs`
"""
# if couldn't find the
if code is not None:
expr = compile(code, "test", "exec")
exec(expr, global_inputs)
def test_quantization_doc_ptdq(self):
path_from_pytorch = "docs/source/quantization.rst"
unique_identifier = "PTDQ API Example::"
code = self._get_code(path_from_pytorch, unique_identifier)
self._test_code(code)
def test_quantization_doc_ptsq(self):
path_from_pytorch = "docs/source/quantization.rst"
unique_identifier = "PTSQ API Example::"
code = self._get_code(path_from_pytorch, unique_identifier)
self._test_code(code)
def test_quantization_doc_qat(self):
path_from_pytorch = "docs/source/quantization.rst"
unique_identifier = "QAT API Example::"
def _dummy_func(*args, **kwargs):
return None
input_fp32 = torch.randn(1, 1, 1, 1)
global_inputs = {"training_loop": _dummy_func, "input_fp32": input_fp32}
code = self._get_code(path_from_pytorch, unique_identifier)
self._test_code(code, global_inputs)
def test_quantization_doc_fx(self):
path_from_pytorch = "docs/source/quantization.rst"
unique_identifier = "FXPTQ API Example::"
input_fp32 = SingleLayerLinearModel().get_example_inputs()
global_inputs = {"UserModel": SingleLayerLinearModel, "input_fp32": input_fp32}
code = self._get_code(path_from_pytorch, unique_identifier)
self._test_code(code, global_inputs)
def test_quantization_doc_custom(self):
path_from_pytorch = "docs/source/quantization.rst"
unique_identifier = "Custom API Example::"
global_inputs = {"nnq": torch.ao.nn.quantized}
code = self._get_code(path_from_pytorch, unique_identifier)
self._test_code(code, global_inputs)
if __name__ == "__main__":
raise_on_run_directly("test/test_quantization.py")

View File

@ -38,13 +38,6 @@ from quantization.core.test_workflow_module import TestDistributed # noqa: F401
from quantization.core.test_workflow_module import TestFusedObsFakeQuantModule # noqa: F401
from quantization.core.test_backend_config import TestBackendConfig # noqa: F401
from quantization.core.test_utils import TestUtils # noqa: F401
log = logging.getLogger(__name__)
try:
# This test has extra data dependencies, so in some environments, e.g. Meta internal
# Buck, it has its own test runner.
from quantization.core.test_docs import TestQuantizationDocs # noqa: F401
except ImportError as e:
log.warning(e)
# Eager Mode Workflow. Tests for the functionality of APIs and different features implemented
# using eager mode.
@ -67,6 +60,7 @@ from quantization.eager.test_equalize_eager import TestEqualizeEager # noqa: F4
from quantization.eager.test_bias_correction_eager import TestBiasCorrectionEager # noqa: F401
log = logging.getLogger(__name__)
# FX GraphModule Graph Mode Quantization. Tests for the functionality of APIs and different features implemented
# using fx quantization.
try: