mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Remove pytorch quant docs since we are moving to torchao (#157766)
Summary: att Test Plan: doc page generated from CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/157766 Approved by: https://github.com/Skylion007
This commit is contained in:
committed by
PyTorch MergeBot
parent
dd93883231
commit
11a86ad2fa
@ -15,7 +15,6 @@ help:
|
||||
|
||||
figures:
|
||||
@$(PYCMD) source/scripts/build_activation_images.py
|
||||
@$(PYCMD) source/scripts/build_quantization_configs.py
|
||||
@$(PYCMD) source/scripts/build_lr_scheduler_images.py
|
||||
|
||||
onnx:
|
||||
|
@ -1,96 +0,0 @@
|
||||
# Quantization Accuracy Debugging
|
||||
|
||||
This document provides high level strategies for improving quantization
|
||||
accuracy. If a quantized model has error compared to the original model,
|
||||
we can categorize the error into:
|
||||
|
||||
1. **data insensitive error** - caused by intrinsic model quantization error,
|
||||
large portion of input data has large error
|
||||
2. **data sensitive error** - caused by outlier input data, small
|
||||
portion of input data has large error
|
||||
3. **implementation error** - quantized kernel is not matching reference implementation
|
||||
|
||||
## Data insensitive error
|
||||
|
||||
### General tips
|
||||
|
||||
1. For PTQ, ensure that the data you are calibrating with is representative
|
||||
of your dataset. For example, for a classification problem a general
|
||||
guideline is to have multiple samples in every category, and the overall
|
||||
number of samples should be at least 100. There is no penalty for
|
||||
calibrating with more data other than calibration time.
|
||||
2. If your model has Conv-BN or Linear-BN patterns, consider fusing them.
|
||||
If you are using FX graph mode quantization, this is done automatically
|
||||
by the workflow. If you are using Eager mode quantization, you can do
|
||||
this manually with the ``torch.ao.quantization.fuse_modules`` API.
|
||||
3. Increase the precision of dtype of the problematic ops. Usually, fp32
|
||||
will have the highest accuracy, followed by fp16, followed by dynamically
|
||||
quantized int8, followed by statically quantized int8.
|
||||
|
||||
1. Note: this is trading off performance for accuracy.
|
||||
2. Note: availability of kernels per dtype per op can vary by backend.
|
||||
3. Note: dtype conversions add an additional performance cost. For example,
|
||||
``fp32_op -> quant -> int8_op -> dequant -> fp32_op -> quant -> int8_op -> dequant``
|
||||
will have a performance penalty compared to
|
||||
``fp32_op -> fp32_op -> quant -> int8_op -> int8_op -> dequant``
|
||||
because of a higher number of required dtype conversions.
|
||||
|
||||
4. If you are using PTQ, consider using QAT to recover some of the accuracy loss
|
||||
from quantization.
|
||||
|
||||
### Int8 quantization tips
|
||||
|
||||
1. If you are using per-tensor weight quantization, consider using per-channel
|
||||
weight quantization.
|
||||
2. If you are doing inference on `fbgemm`, ensure that you set the `reduce_range`
|
||||
argument to `False` if your CPU is Cooperlake or newer, and to `True` otherwise.
|
||||
3. Audit the input activation distribution variation across different samples.
|
||||
If this variation is high, the layer may be suitable for dynamic quantization
|
||||
but not static quantization.
|
||||
|
||||
## Data sensitive error
|
||||
|
||||
If you are using static quantization and a small portion of your input data is
|
||||
resulting in high quantization error, you can try:
|
||||
|
||||
1. Adjust your calibration dataset to make it more representative of your
|
||||
inference dataset.
|
||||
2. Manually inspect (using Numeric Suite) which layers have high quantization
|
||||
error. For these layers, consider leaving them in floating point or adjusting
|
||||
the observer settings to choose a better scale and zero_point.
|
||||
|
||||
|
||||
## Implementation error
|
||||
|
||||
If you are using PyTorch quantization with your own backend
|
||||
you may see differences between the reference implementation of an
|
||||
operation (such as ``dequant -> op_fp32 -> quant``) and the quantized implementation
|
||||
(such as `op_int8`) of the op on the target hardware. This could mean one of two things:
|
||||
|
||||
1. the differences (usually small) are expected due to specific behavior of
|
||||
the target kernel on the target hardware compared to fp32/cpu. An example of this
|
||||
is accumulating in an integer dtype. Unless the kernel guarantees bitwise
|
||||
equivalency with the reference implementation, this is expected.
|
||||
2. the kernel on the target hardware has an accuracy issue. In this case, reach
|
||||
out to the kernel developer.
|
||||
|
||||
## Numerical Debugging Tooling (prototype)
|
||||
|
||||
```{eval-rst}
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
||||
torch.ao.ns._numeric_suite
|
||||
torch.ao.ns._numeric_suite_fx
|
||||
```
|
||||
|
||||
```{warning}
|
||||
Numerical debugging tooling is early prototype and subject to change.
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
* :ref:`torch_ao_ns_numeric_suite`
|
||||
Eager mode numeric suite
|
||||
* :ref:`torch_ao_ns_numeric_suite_fx`
|
||||
FX numeric suite
|
||||
```
|
@ -1,19 +0,0 @@
|
||||
# Quantization Backend Configuration
|
||||
|
||||
FX Graph Mode Quantization allows the user to configure various
|
||||
quantization behaviors of an op in order to match the expectation
|
||||
of their backend.
|
||||
|
||||
In the future, this document will contain a detailed spec of
|
||||
these configurations.
|
||||
|
||||
## Default values for native configurations
|
||||
|
||||
Below is the output of the configuration for quantization of ops
|
||||
in x86 and qnnpack (PyTorch's default quantized backends).
|
||||
|
||||
Results:
|
||||
|
||||
```{eval-rst}
|
||||
.. literalinclude:: scripts/quantization_backend_configs/default_backend_config.txt
|
||||
```
|
File diff suppressed because it is too large
Load Diff
@ -1,64 +0,0 @@
|
||||
"""
|
||||
This script will generate default values of quantization configs.
|
||||
These are for use in the documentation.
|
||||
"""
|
||||
|
||||
import os.path
|
||||
|
||||
import torch
|
||||
from torch.ao.quantization.backend_config import get_native_backend_config_dict
|
||||
from torch.ao.quantization.backend_config.utils import (
|
||||
entry_to_pretty_str,
|
||||
remove_boolean_dispatch_from_name,
|
||||
)
|
||||
|
||||
|
||||
# Create a directory for the images, if it doesn't exist
|
||||
QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH = os.path.join(
|
||||
os.path.realpath(os.path.dirname(__file__)), "quantization_backend_configs"
|
||||
)
|
||||
|
||||
if not os.path.exists(QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH):
|
||||
os.mkdir(QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH)
|
||||
|
||||
output_path = os.path.join(
|
||||
QUANTIZATION_BACKEND_CONFIG_IMAGE_PATH, "default_backend_config.txt"
|
||||
)
|
||||
|
||||
with open(output_path, "w") as f:
|
||||
native_backend_config_dict = get_native_backend_config_dict()
|
||||
|
||||
configs = native_backend_config_dict["configs"]
|
||||
|
||||
def _sort_key_func(entry):
|
||||
pattern = entry["pattern"]
|
||||
while isinstance(pattern, tuple):
|
||||
pattern = pattern[-1]
|
||||
|
||||
pattern = remove_boolean_dispatch_from_name(pattern)
|
||||
if not isinstance(pattern, str):
|
||||
# methods are already strings
|
||||
pattern = torch.typename(pattern)
|
||||
|
||||
# we want
|
||||
#
|
||||
# torch.nn.modules.pooling.AdaptiveAvgPool1d
|
||||
#
|
||||
# and
|
||||
#
|
||||
# torch._VariableFunctionsClass.adaptive_avg_pool1d
|
||||
#
|
||||
# to be next to each other, so convert to all lower case
|
||||
# and remove the underscores, and compare the last part
|
||||
# of the string
|
||||
pattern_str_normalized = pattern.lower().replace("_", "")
|
||||
key = pattern_str_normalized.split(".")[-1]
|
||||
return key
|
||||
|
||||
configs.sort(key=_sort_key_func)
|
||||
|
||||
entries = []
|
||||
for entry in configs:
|
||||
entries.append(entry_to_pretty_str(entry))
|
||||
entries = ",\n".join(entries)
|
||||
f.write(entries)
|
@ -1,16 +0,0 @@
|
||||
(torch_ao_ns_numeric_suite)=
|
||||
|
||||
# torch.ao.ns._numeric_suite
|
||||
|
||||
```{warning}
|
||||
This module is an early prototype and is subject to change.
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. currentmodule:: torch.ao.ns._numeric_suite
|
||||
```
|
||||
```{eval-rst}
|
||||
.. automodule:: torch.ao.ns._numeric_suite
|
||||
:members:
|
||||
:member-order: bysource
|
||||
```
|
@ -1,39 +0,0 @@
|
||||
(torch_ao_ns_numeric_suite_fx)=
|
||||
|
||||
# torch.ao.ns._numeric_suite_fx
|
||||
|
||||
|
||||
```{warning}
|
||||
This module is an early prototype and is subject to change.
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: torch.ao.ns._numeric_suite_fx
|
||||
:members:
|
||||
:member-order: bysource
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
# torch.ao.ns.fx.utils
|
||||
|
||||
|
||||
```{warning}
|
||||
This module is an early prototype and is subject to change.
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. currentmodule:: torch.ao.ns.fx.utils
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. function:: compute_sqnr(x, y)
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. function:: compute_normalized_l2_error(x, y)
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. function:: compute_cosine_similarity(x, y)
|
||||
```
|
@ -1,146 +0,0 @@
|
||||
# Owner(s): ["oncall: quantization"]
|
||||
|
||||
import re
|
||||
import contextlib
|
||||
from pathlib import Path
|
||||
|
||||
import torch
|
||||
|
||||
from torch.testing._internal.common_quantization import (
|
||||
QuantizationTestCase,
|
||||
SingleLayerLinearModel,
|
||||
)
|
||||
from torch.testing._internal.common_quantized import override_quantized_engine
|
||||
from torch.testing._internal.common_utils import raise_on_run_directly, IS_ARM64, IS_FBCODE
|
||||
import unittest
|
||||
|
||||
|
||||
@unittest.skipIf(IS_FBCODE, "some path issues in fbcode")
|
||||
class TestQuantizationDocs(QuantizationTestCase):
|
||||
r"""
|
||||
The tests in this section import code from the quantization docs and check that
|
||||
they actually run without errors. In cases where objects are undefined in the code snippet,
|
||||
they must be provided in the test. The imports seem to behave a bit inconsistently,
|
||||
they can be imported either in the test file or passed as a global input
|
||||
"""
|
||||
|
||||
def run(self, result=None):
|
||||
with override_quantized_engine("qnnpack") if IS_ARM64 else contextlib.nullcontext():
|
||||
super().run(result)
|
||||
|
||||
def _get_code(
|
||||
self, path_from_pytorch, unique_identifier, offset=2, short_snippet=False
|
||||
):
|
||||
r"""
|
||||
This function reads in the code from the docs given a unique identifier.
|
||||
Most code snippets have a 2 space indentation, for other indentation levels,
|
||||
change the offset `arg`. the `short_snippet` arg can be set to allow for testing
|
||||
of smaller snippets, the check that this arg controls is used to make sure that
|
||||
we are not accidentally only importing a blank line or something.
|
||||
"""
|
||||
|
||||
def get_correct_path(path_from_pytorch):
|
||||
r"""
|
||||
Current working directory when CI is running test seems to vary, this function
|
||||
looks for docs relative to this test file.
|
||||
"""
|
||||
core_dir = Path(__file__).parent
|
||||
assert core_dir.match("test/quantization/core/"), (
|
||||
"test_docs.py is in an unexpected location. If you've been "
|
||||
"moving files around, ensure that the test and build files have "
|
||||
"been updated to have the correct relative path between "
|
||||
"test_docs.py and the docs."
|
||||
)
|
||||
pytorch_root = core_dir.parents[2]
|
||||
return pytorch_root / path_from_pytorch
|
||||
|
||||
path_to_file = get_correct_path(path_from_pytorch)
|
||||
if path_to_file:
|
||||
with open(path_to_file) as file:
|
||||
content = file.readlines()
|
||||
|
||||
# it will register as having a newline at the end in python
|
||||
if "\n" not in unique_identifier:
|
||||
unique_identifier += "\n"
|
||||
|
||||
assert unique_identifier in content, f"could not find {unique_identifier} in {path_to_file}"
|
||||
|
||||
# get index of first line of code
|
||||
line_num_start = content.index(unique_identifier) + 1
|
||||
|
||||
# next find where the code chunk ends.
|
||||
# this regex will match lines that don't start
|
||||
# with a \n or " " with number of spaces=offset
|
||||
r = r = re.compile("^[^\n," + " " * offset + "]")
|
||||
# this will return the line of first line that matches regex
|
||||
line_after_code = next(filter(r.match, content[line_num_start:]))
|
||||
last_line_num = content.index(line_after_code)
|
||||
|
||||
# remove the first `offset` chars of each line and gather it all together
|
||||
code = "".join(
|
||||
[x[offset:] for x in content[line_num_start + 1 : last_line_num]]
|
||||
)
|
||||
|
||||
# want to make sure we are actually getting some code,
|
||||
assert last_line_num - line_num_start > 3 or short_snippet, (
|
||||
f"The code in {path_to_file} identified by {unique_identifier} seems suspiciously short:"
|
||||
f"\n\n###code-start####\n{code}###code-end####"
|
||||
)
|
||||
return code
|
||||
|
||||
return None
|
||||
|
||||
def _test_code(self, code, global_inputs=None):
|
||||
r"""
|
||||
This function runs `code` using any vars in `global_inputs`
|
||||
"""
|
||||
# if couldn't find the
|
||||
if code is not None:
|
||||
expr = compile(code, "test", "exec")
|
||||
exec(expr, global_inputs)
|
||||
|
||||
def test_quantization_doc_ptdq(self):
|
||||
path_from_pytorch = "docs/source/quantization.rst"
|
||||
unique_identifier = "PTDQ API Example::"
|
||||
code = self._get_code(path_from_pytorch, unique_identifier)
|
||||
self._test_code(code)
|
||||
|
||||
def test_quantization_doc_ptsq(self):
|
||||
path_from_pytorch = "docs/source/quantization.rst"
|
||||
unique_identifier = "PTSQ API Example::"
|
||||
code = self._get_code(path_from_pytorch, unique_identifier)
|
||||
self._test_code(code)
|
||||
|
||||
def test_quantization_doc_qat(self):
|
||||
path_from_pytorch = "docs/source/quantization.rst"
|
||||
unique_identifier = "QAT API Example::"
|
||||
|
||||
def _dummy_func(*args, **kwargs):
|
||||
return None
|
||||
|
||||
input_fp32 = torch.randn(1, 1, 1, 1)
|
||||
global_inputs = {"training_loop": _dummy_func, "input_fp32": input_fp32}
|
||||
code = self._get_code(path_from_pytorch, unique_identifier)
|
||||
self._test_code(code, global_inputs)
|
||||
|
||||
def test_quantization_doc_fx(self):
|
||||
path_from_pytorch = "docs/source/quantization.rst"
|
||||
unique_identifier = "FXPTQ API Example::"
|
||||
|
||||
input_fp32 = SingleLayerLinearModel().get_example_inputs()
|
||||
global_inputs = {"UserModel": SingleLayerLinearModel, "input_fp32": input_fp32}
|
||||
|
||||
code = self._get_code(path_from_pytorch, unique_identifier)
|
||||
self._test_code(code, global_inputs)
|
||||
|
||||
def test_quantization_doc_custom(self):
|
||||
path_from_pytorch = "docs/source/quantization.rst"
|
||||
unique_identifier = "Custom API Example::"
|
||||
|
||||
global_inputs = {"nnq": torch.ao.nn.quantized}
|
||||
|
||||
code = self._get_code(path_from_pytorch, unique_identifier)
|
||||
self._test_code(code, global_inputs)
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise_on_run_directly("test/test_quantization.py")
|
@ -38,13 +38,6 @@ from quantization.core.test_workflow_module import TestDistributed # noqa: F401
|
||||
from quantization.core.test_workflow_module import TestFusedObsFakeQuantModule # noqa: F401
|
||||
from quantization.core.test_backend_config import TestBackendConfig # noqa: F401
|
||||
from quantization.core.test_utils import TestUtils # noqa: F401
|
||||
log = logging.getLogger(__name__)
|
||||
try:
|
||||
# This test has extra data dependencies, so in some environments, e.g. Meta internal
|
||||
# Buck, it has its own test runner.
|
||||
from quantization.core.test_docs import TestQuantizationDocs # noqa: F401
|
||||
except ImportError as e:
|
||||
log.warning(e)
|
||||
|
||||
# Eager Mode Workflow. Tests for the functionality of APIs and different features implemented
|
||||
# using eager mode.
|
||||
@ -67,6 +60,7 @@ from quantization.eager.test_equalize_eager import TestEqualizeEager # noqa: F4
|
||||
from quantization.eager.test_bias_correction_eager import TestBiasCorrectionEager # noqa: F401
|
||||
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
# FX GraphModule Graph Mode Quantization. Tests for the functionality of APIs and different features implemented
|
||||
# using fx quantization.
|
||||
try:
|
||||
|
Reference in New Issue
Block a user