Remove pytorch quant docs since we are moving to torchao (#157766)

Summary: att Test Plan: doc page generated from CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/157766 Approved by: https://github.com/Skylion007
2025-10-20 21:14:14 +08:00 · 2025-07-10 10:07:39 -07:00
parent dd93883231
commit 11a86ad2fa
9 changed files with 21 additions and 1600 deletions
--- a/docs/Makefile
+++ b/docs/Makefile
@ -15,7 +15,6 @@ help:

 figures:
 	@$(PYCMD) source/scripts/build_activation_images.py
-	@$(PYCMD) source/scripts/build_quantization_configs.py
 	@$(PYCMD) source/scripts/build_lr_scheduler_images.py

 onnx:
--- a/docs/source/quantization-accuracy-debugging.md
+++ b/docs/source/quantization-accuracy-debugging.md
@ -1,96 +0,0 @@
-# Quantization Accuracy Debugging
-
-This document provides high level strategies for improving quantization
-accuracy. If a quantized model has error compared to the original model,
-we can categorize the error into:
-
-1. **data insensitive error** - caused by intrinsic model quantization error,
-   large portion of input data has large error
-2. **data sensitive error** - caused by outlier input data, small
-   portion of input data has large error
-3. **implementation error** - quantized kernel is not matching reference implementation
-
-## Data insensitive error
-
-### General tips
-
-1. For PTQ, ensure that the data you are calibrating with is representative
-   of your dataset. For example, for a classification problem a general
-   guideline is to have multiple samples in every category, and the overall
-   number of samples should be at least 100. There is no penalty for
-   calibrating with more data other than calibration time.
-2. If your model has Conv-BN or Linear-BN patterns, consider fusing them.
-   If you are using FX graph mode quantization, this is done automatically
-   by the workflow. If you are using Eager mode quantization, you can do
-   this manually with the ``torch.ao.quantization.fuse_modules`` API.
-3. Increase the precision of dtype of the problematic ops. Usually, fp32
-   will have the highest accuracy, followed by fp16, followed by dynamically
-   quantized int8, followed by statically quantized int8.
-
-   1. Note: this is trading off performance for accuracy.
-   2. Note: availability of kernels per dtype per op can vary by backend.
-   3. Note: dtype conversions add an additional performance cost. For example,
-      ``fp32_op -> quant -> int8_op -> dequant -> fp32_op -> quant -> int8_op -> dequant``
-      will have a performance penalty compared to
-      ``fp32_op -> fp32_op -> quant -> int8_op -> int8_op -> dequant``
-      because of a higher number of required dtype conversions.
-
-4. If you are using PTQ, consider using QAT to recover some of the accuracy loss
-   from quantization.
-
-### Int8 quantization tips
-
-1. If you are using per-tensor weight quantization, consider using per-channel
-   weight quantization.
-2. If you are doing inference on `fbgemm`, ensure that you set the `reduce_range`
-   argument to `False` if your CPU is Cooperlake or newer, and to `True` otherwise.
-3. Audit the input activation distribution variation across different samples.
-   If this variation is high, the layer may be suitable for dynamic quantization
-   but not static quantization.
-
-## Data sensitive error
-
-If you are using static quantization and a small portion of your input data is
-resulting in high quantization error, you can try:
-
-1. Adjust your calibration dataset to make it more representative of your
-   inference dataset.
-2. Manually inspect (using Numeric Suite) which layers have high quantization
-   error. For these layers, consider leaving them in floating point or adjusting
-   the observer settings to choose a better scale and zero_point.
-
-
-## Implementation error
-
-If you are using PyTorch quantization with your own backend
-you may see differences between the reference implementation of an
-operation (such as ``dequant -> op_fp32 -> quant``) and the quantized implementation
-(such as `op_int8`) of the op on the target hardware. This could mean one of two things:
-
-1. the differences (usually small) are expected due to specific behavior of
-   the target kernel on the target hardware compared to fp32/cpu. An example of this
-   is accumulating in an integer dtype. Unless the kernel guarantees bitwise
-   equivalency with the reference implementation, this is expected.
-2. the kernel on the target hardware has an accuracy issue. In this case, reach
-   out to the kernel developer.
-
-## Numerical Debugging Tooling (prototype)
-
-```{eval-rst}
-.. toctree::
-    :hidden:
-
-    torch.ao.ns._numeric_suite
-    torch.ao.ns._numeric_suite_fx
-```
-
-```{warning}
-Numerical debugging tooling is early prototype and subject to change.
-```
-
-```{eval-rst}
-* :ref:`torch_ao_ns_numeric_suite`
-  Eager mode numeric suite
-* :ref:`torch_ao_ns_numeric_suite_fx`
-  FX numeric suite
-```
--- a/docs/source/quantization-backend-configuration.md
+++ b/docs/source/quantization-backend-configuration.md
@ -1,19 +0,0 @@
-# Quantization Backend Configuration
-
-FX Graph Mode Quantization allows the user to configure various
-quantization behaviors of an op in order to match the expectation
-of their backend.
-
-In the future, this document will contain a detailed spec of
-these configurations.
-
-## Default values for native configurations
-
-Below is the output of the configuration for quantization of ops
-in x86 and qnnpack (PyTorch's default quantized backends).
-
-Results:
-
-```{eval-rst}
-.. literalinclude:: scripts/quantization_backend_configs/default_backend_config.txt
-```
--- a/docs/source/quantization.rst
+++ b/docs/source/quantization.rst
--- a/docs/source/scripts/build_quantization_configs.py
+++ b/docs/source/scripts/build_quantization_configs.py
@ -1,64 +0,0 @@
-"""
-This script will generate default values of quantization configs.
-These are for use in the documentation.
-"""
-
-import os.path
-
-import torch
-from torch.ao.quantization.backend_config import get_native_backend_config_dict
-from torch.ao.quantization.backend_config.utils import (
-    entry_to_pretty_str,
-    remove_boolean_dispatch_from_name,
-)
-
-
-# Create a directory for the images, if it doesn't exist
-QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH = os.path.join(
-    os.path.realpath(os.path.dirname(__file__)), "quantization_backend_configs"
-)
-
-if not os.path.exists(QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH):
-    os.mkdir(QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH)
-
-output_path = os.path.join(
-    QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH, "default_backend_config.txt"
-)
-
-with open(output_path, "w") as f:
-    native_backend_config_dict = get_native_backend_config_dict()
-
-    configs = native_backend_config_dict["configs"]
-
-    def _sort_key_func(entry):
-        pattern = entry["pattern"]
-        while isinstance(pattern, tuple):
-            pattern = pattern[-1]
-
-        pattern = remove_boolean_dispatch_from_name(pattern)
-        if not isinstance(pattern, str):
-            # methods are already strings
-            pattern = torch.typename(pattern)
-
-        # we want
-        #
-        #   torch.nn.modules.pooling.AdaptiveAvgPool1d
-        #
-        # and
-        #
-        #   torch._VariableFunctionsClass.adaptive_avg_pool1d
-        #
-        # to be next to each other, so convert to all lower case
-        # and remove the underscores, and compare the last part
-        # of the string
-        pattern_str_normalized = pattern.lower().replace("_", "")
-        key = pattern_str_normalized.split(".")[-1]
-        return key
-
-    configs.sort(key=_sort_key_func)
-
-    entries = []
-    for entry in configs:
-        entries.append(entry_to_pretty_str(entry))
-    entries = ",\n".join(entries)
-    f.write(entries)
--- a/docs/source/torch.ao.ns._numeric_suite.md
+++ b/docs/source/torch.ao.ns._numeric_suite.md
@ -1,16 +0,0 @@
-(torch_ao_ns_numeric_suite)=
-
-# torch.ao.ns._numeric_suite
-
-```{warning}
-This module is an early prototype and is subject to change.
-```
-
-```{eval-rst}
-.. currentmodule:: torch.ao.ns._numeric_suite
-```
-```{eval-rst}
-.. automodule:: torch.ao.ns._numeric_suite
-    :members:
-    :member-order: bysource
-```
--- a/docs/source/torch.ao.ns._numeric_suite_fx.md
+++ b/docs/source/torch.ao.ns._numeric_suite_fx.md
@ -1,39 +0,0 @@
-(torch_ao_ns_numeric_suite_fx)=
-
-# torch.ao.ns._numeric_suite_fx
-
-
-```{warning}
-    This module is an early prototype and is subject to change.
-```
-
-```{eval-rst}
-.. automodule:: torch.ao.ns._numeric_suite_fx
-    :members:
-    :member-order: bysource
-
-```
---
-
-# torch.ao.ns.fx.utils
-
-
-```{warning}
-    This module is an early prototype and is subject to change.
-```
-
-```{eval-rst}
-.. currentmodule:: torch.ao.ns.fx.utils
-```
-
-```{eval-rst}
-.. function:: compute_sqnr(x, y)
-```
-
-```{eval-rst}
-.. function:: compute_normalized_l2_error(x, y)
-```
-
-```{eval-rst}
-.. function:: compute_cosine_similarity(x, y)
-```
--- a/test/quantization/core/test_docs.py
+++ b/test/quantization/core/test_docs.py
@ -1,146 +0,0 @@
-# Owner(s): ["oncall: quantization"]
-
-import re
-import contextlib
-from pathlib import Path
-
-import torch
-
-from torch.testing._internal.common_quantization import (
-    QuantizationTestCase,
-    SingleLayerLinearModel,
-)
-from torch.testing._internal.common_quantized import override_quantized_engine
-from torch.testing._internal.common_utils import raise_on_run_directly, IS_ARM64, IS_FBCODE
-import unittest
-
-
-@unittest.skipIf(IS_FBCODE, "some path issues in fbcode")
-class TestQuantizationDocs(QuantizationTestCase):
-    r"""
-    The tests in this section import code from the quantization docs and check that
-    they actually run without errors. In cases where objects are undefined in the code snippet,
-    they must be provided in the test. The imports seem to behave a bit inconsistently,
-    they can be imported either in the test file or passed as a global input
-    """
-
-    def run(self, result=None):
-        with override_quantized_engine("qnnpack") if IS_ARM64 else contextlib.nullcontext():
-            super().run(result)
-
-    def _get_code(
-        self, path_from_pytorch, unique_identifier, offset=2, short_snippet=False
-    ):
-        r"""
-        This function reads in the code from the docs given a unique identifier.
-        Most code snippets have a 2 space indentation, for other indentation levels,
-        change the offset `arg`. the `short_snippet` arg can be set to allow for testing
-        of smaller snippets, the check that this arg controls is used to make sure that
-        we are not accidentally only importing a blank line or something.
-        """
-
-        def get_correct_path(path_from_pytorch):
-            r"""
-            Current working directory when CI is running test seems to vary, this function
-            looks for docs relative to this test file.
-            """
-            core_dir = Path(__file__).parent
-            assert core_dir.match("test/quantization/core/"), (
-                "test_docs.py is in an unexpected location. If you've been "
-                "moving files around, ensure that the test and build files have "
-                "been updated to have the correct relative path between "
-                "test_docs.py and the docs."
-            )
-            pytorch_root = core_dir.parents[2]
-            return pytorch_root / path_from_pytorch
-
-        path_to_file = get_correct_path(path_from_pytorch)
-        if path_to_file:
-            with open(path_to_file) as file:
-                content = file.readlines()
-
-            # it will register as having a newline at the end in python
-            if "\n" not in unique_identifier:
-                unique_identifier += "\n"
-
-            assert unique_identifier in content, f"could not find {unique_identifier} in {path_to_file}"
-
-            # get index of first line of code
-            line_num_start = content.index(unique_identifier) + 1
-
-            # next find where the code chunk ends.
-            # this regex will match lines that don't start
-            # with a \n or "  " with number of spaces=offset
-            r = r = re.compile("^[^\n," + " " * offset + "]")
-            # this will return the line of first line that matches regex
-            line_after_code = next(filter(r.match, content[line_num_start:]))
-            last_line_num = content.index(line_after_code)
-
-            # remove the first `offset` chars of each line and gather it all together
-            code = "".join(
-                [x[offset:] for x in content[line_num_start + 1 : last_line_num]]
-            )
-
-            # want to make sure we are actually getting some code,
-            assert last_line_num - line_num_start > 3 or short_snippet, (
-                f"The code in {path_to_file} identified by {unique_identifier} seems suspiciously short:"
-                f"\n\n###code-start####\n{code}###code-end####"
-            )
-            return code
-
-        return None
-
-    def _test_code(self, code, global_inputs=None):
-        r"""
-        This function runs `code` using any vars in `global_inputs`
-        """
-        # if couldn't find the
-        if code is not None:
-            expr = compile(code, "test", "exec")
-            exec(expr, global_inputs)
-
-    def test_quantization_doc_ptdq(self):
-        path_from_pytorch = "docs/source/quantization.rst"
-        unique_identifier = "PTDQ API Example::"
-        code = self._get_code(path_from_pytorch, unique_identifier)
-        self._test_code(code)
-
-    def test_quantization_doc_ptsq(self):
-        path_from_pytorch = "docs/source/quantization.rst"
-        unique_identifier = "PTSQ API Example::"
-        code = self._get_code(path_from_pytorch, unique_identifier)
-        self._test_code(code)
-
-    def test_quantization_doc_qat(self):
-        path_from_pytorch = "docs/source/quantization.rst"
-        unique_identifier = "QAT API Example::"
-
-        def _dummy_func(*args, **kwargs):
-            return None
-
-        input_fp32 = torch.randn(1, 1, 1, 1)
-        global_inputs = {"training_loop": _dummy_func, "input_fp32": input_fp32}
-        code = self._get_code(path_from_pytorch, unique_identifier)
-        self._test_code(code, global_inputs)
-
-    def test_quantization_doc_fx(self):
-        path_from_pytorch = "docs/source/quantization.rst"
-        unique_identifier = "FXPTQ API Example::"
-
-        input_fp32 = SingleLayerLinearModel().get_example_inputs()
-        global_inputs = {"UserModel": SingleLayerLinearModel, "input_fp32": input_fp32}
-
-        code = self._get_code(path_from_pytorch, unique_identifier)
-        self._test_code(code, global_inputs)
-
-    def test_quantization_doc_custom(self):
-        path_from_pytorch = "docs/source/quantization.rst"
-        unique_identifier = "Custom API Example::"
-
-        global_inputs = {"nnq": torch.ao.nn.quantized}
-
-        code = self._get_code(path_from_pytorch, unique_identifier)
-        self._test_code(code, global_inputs)
-
-if __name__ == "__main__":
-    raise_on_run_directly("test/test_quantization.py")
--- a/test/test_quantization.py
+++ b/test/test_quantization.py
@ -38,13 +38,6 @@ from quantization.core.test_workflow_module import TestDistributed  # noqa: F401
 from quantization.core.test_workflow_module import TestFusedObsFakeQuantModule  # noqa: F401
 from quantization.core.test_backend_config import TestBackendConfig  # noqa: F401
 from quantization.core.test_utils import TestUtils  # noqa: F401
-log = logging.getLogger(__name__)
-try:
-    # This test has extra data dependencies, so in some environments, e.g. Meta internal
-    # Buck, it has its own test runner.
-    from quantization.core.test_docs import TestQuantizationDocs  # noqa: F401
-except ImportError as e:
-    log.warning(e)

 # Eager Mode Workflow. Tests for the functionality of APIs and different features implemented
 # using eager mode.
@ -67,6 +60,7 @@ from quantization.eager.test_equalize_eager import TestEqualizeEager  # noqa: F4
 from quantization.eager.test_bias_correction_eager import TestBiasCorrectionEager  # noqa: F401


+log = logging.getLogger(__name__)
 # FX GraphModule Graph Mode Quantization. Tests for the functionality of APIs and different features implemented
 # using fx quantization.
 try: