Summary:
Ensures that creating tensors, copying, filling with zeroes, checking for nan works on cuda for the `float8` dtypes. This should be enough for float8 emulation on cuda.
Note that I skipped the mul test - it's less trivial to add (need a new c++ macro), and there is no use case for it. We can follow up on that in the future.
Test Plan:
```
python test/test_quantization.py TestFloat8Dtype
```
Reviewers:
Subscribers:
Tasks:
Tags:
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105807
Approved by: https://github.com/ezyang, https://github.com/jerryzh168, https://github.com/albanD
Summary:
att, we use module partition API to identify the GRU submodule and annotate all necessary patterns
Test Plan: buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
Differential Revision: D46689428
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103526
Approved by: https://github.com/andrewor14
Summary: att, we use module partition API to identify the GRU submodule and annotate all necessary patterns
Test Plan: buck2 test mode/opt caffe2/test:quantization_pt2e -- 'caffe2/test:quantization_pt2e'
Reviewed By: kimishpatel
Differential Revision: D46384329
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103358
Approved by: https://github.com/HDCharles
Summary:
We are currently silently skipping all PT2 quantization
tests due to a recent typo. This commit fixes this and also adds
warnings so it'll be easier to debug similar issues in the future.
Test Plan: python test/test_quantization.py
Differential Revision: D46383546
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102819
Approved by: https://github.com/jerryzh168
Summary:
We are currently silently skipping all PT2 quantization
tests due to a recent typo. This commit fixes this and also adds
warnings so it'll be easier to debug similar issues in the future.
Test Plan: python test/test_quantization.py
Differential Revision: D46329480
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102644
Approved by: https://github.com/jerryzh168
This diff introduces utility `find_sequential_partitions`.
This utility allows one to specify sequential pattern of
nn.Module/nn.functional and returns a list. Each item in the list contains a
List[SourcePartition] that represents sequentially connected partitions that
are of the pattern requested.
For example `find_sequential_partitions(model, [nn.Conv2d, nn.ReLU])` will find
all nn.Conv2d and nn.ReLU partitions that are sequentially connected.
Furthmore, move to using `find_sequential_partitions` for conv_bn/conv_bn_relu
for QAT.
Differential Revision: [D45948057](https://our.internmc.facebook.com/intern/diff/D45948057/)
**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D45948057/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102394
Approved by: https://github.com/jerryzh168
**Summary**
After https://github.com/pytorch/pytorch/pull/99064 and https://github.com/pytorch/pytorch/pull/99065 merged, the pt2e UT path has changed, also need to change the module path in `test/test_quantization.py`. Then we can run these tests in top level's test directory.
**Test Plan**
```
cd test && python -u -m pytest test_quantization.py -k TestQuantizePT2E
cd test && python -u -m pytest test_quantization.py -k TestQuantizePT2EModels
cd test && python -u -m pytest test_quantization.py -k TestQuantizePT2EFX
cd test && python -u -m pytest test_quantization.py -k TestQuantizePT2EFXX86Inductor
cd test && python -u -m pytest test_quantization.py -k TestQuantizePT2EFXModels
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99402
Approved by: https://github.com/jerryzh168
Summary:
This is a retry of https://github.com/pytorch/pytorch/pull/94992 which was reverted due to CI issues.
This PR adds a set of unintrepreted data types on PyTorch which can be used to implement experimental functionality out of core (think fp8, int4, int16 quant, etc).
@bypass-github-export-checks
Test Plan:
```
python test/test_quantization.py -k TestBits
```
Reviewers:
Subscribers:
Tasks:
Tags:
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95860
Approved by: https://github.com/atalman
Summary:
This PR adds a set of unintrepreted data types on PyTorch which can be used to implement experimental functionality out of core (think fp8, int4, int16 quant, etc).
Note: this is a copy-pasta of https://github.com/pytorch/pytorch/pull/89990 with a bug fix for clang9, easier to just to put up another PR since I'm not sure how comandeering works with Meta-only changes.
@bypass-github-export-checks
Test Plan:
```
python test/test_quantization.py -k TestBits
```
Reviewers:
Subscribers:
Tasks:
Tags:
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94992
Approved by: https://github.com/angelayi
Summary:
This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack
* torch.ao.quantization.quantize_pt2e.prepare_pt2e
Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration
for post training quantization
* torch.ao.quantization.quantize_pt2e.convert_pt2e
Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules
Also added a backend config for the qnnpack_pt2e backend:
* torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config
Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees
Test Plan:
python test/test_quantization.py TestQuantizePT2EModels
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91035
Approved by: https://github.com/HDCharles
Summary:
This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack
* torch.ao.quantization.quantize_pt2e.prepare_pt2e
Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration
for post training quantization
* torch.ao.quantization.quantize_pt2e.convert_pt2e
Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules
Also added a backend config for the qnnpack_pt2e backend:
* torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config
Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees
Test Plan:
python test/test_quantization.py TestQuantizePT2EModels
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90971
Approved by: https://github.com/HDCharles
Summary:
This PR introduces the top level APIs for quantization support in PyTorch 2.0 Export stack
* torch.ao.quantization.quantize_pt2e.prepare_pt2e
Takes a model that is captured by the PyTorch 2.0 export (torchdynamo full graph mode) and prepares the model for calibration
for post training quantization
* torch.ao.quantization.quantize_pt2e.convert_pt2e
Takes a calibrated model and converts that to a reference quantized model that can be lowered later to quantized operator libraries or delegation modules
Also added a backend config for the qnnpack_pt2e backend:
* torch.ao.quantization.backend_config.get_qnnpack_pt2e_backend_config
Note: everything related to quantize_pt2e are experimental (prototype), and we don't have any bc guarantees
Test Plan:
python test/test_quantization.py TestQuantizePT2EModels
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90802
Approved by: https://github.com/qihqi
Summary:
This PR is an early prototype of a tool to quantize each layer of a model
N times, with N qconfigs each. We follow the design agreed upon in
https://fburl.com/gdoc/e1gaq3ih .
Current API:
```
m = M().eval()
example_input = (torch.randn(2, 2),)
qconfig_mappings = [
QConfigMapping().set_global(torch.quantization.default_qconfig),
QConfigMapping().set_global(torch.quantization.default_dynamic_qconfig),
]
backend_config = get_native_backend_config()
msp = prepare_n_shadows_model(
m, example_input, qconfig_mappings, backend_config)
for _ in range(2):
msp(*example_input)
msq = convert_n_shadows_model(msp)
msq(*example_input)
results = extract_results_n_shadows_model(msq)
print_comparisons_n_shadows_model(results)
// example output
subgraph_idx ref_node_name best_idx 1 2
-------------- --------------- ---------- ------- -------
subgraph_0 fc1 2 42.0834 42.6279
subgraph_1 fc2 2 43.7259 50.0593
```
Test plan:
```
python test/test_quantization.py -k test_n_shadows
```
Differential Revision: [D37650332](https://our.internmc.facebook.com/intern/diff/D37650332)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80521
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14
Context: In order to avoid the cluttering of the `torch.nn` namespace
the quantized modules namespace is moved to `torch.ao.nn`.
The list of the `nn.quantized` files that are being migrated:
- [ ] `torch.nn.quantized` → `torch.ao.nn.quantized`
- [X] [Current PR] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional`
- [ ] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules`
- [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic`
- [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference`
- [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable`
- [ ] `torch.nn.qat` → `torch.ao.nn.qat`
- [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules`
- [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic`
- [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic`
- [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules`
- [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat`
- [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized`
- [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules`
- [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic`
Majority of the files are just moved to the new location.
However, specific files need to be double checked:
- [Documentation](docs/source/quantization-support.rst) @vkuzo
- [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10
Differential Revision: [D36792967](https://our.internmc.facebook.com/intern/diff/D36792967/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36792967/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78712
Approved by: https://github.com/jerryzh168
Summary: Following https://github.com/pytorch/pytorch/pull/78452
and https://github.com/pytorch/pytorch/pull/79066, this commit
is part 1 of the broader effort to replace `backend_config_dict`
with a python config object, a more formal and robust API that
leads to better user experience. Note that there is no change in
behavior in this commit by itself. A future commit (part 2) will
replace all existing usages of `backend_config_dict` with the
`BackendConfig` object added in this commit.
Test Plan:
python test/test_quantization.py TestBackendConfig
Reviewers: jerryzh168
Subscribers: jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81469
Approved by: https://github.com/jerryzh168
Summary: This introduces the skeleton for the ModelReportVisualizer
class. This class helps visualize the information generated by the
ModelReport class `generate_report()` output. This class aims to provide
visualizations in a table, plot (line graph) and histogram view.
This also introduces an empty test class for testing visualizations. As
implementations start occuring for this class, tests will also be
approrpriately added.
This includes the high level descriptions for each of the methods as
well. Expected use cases will be added to the class description in a
future commit as that gets finalized.
Test Plan: python test/test_quantization.py TestFxModelReportVisualizer
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81523
Approved by: https://github.com/andrewor14
Summary: This adds the class framework for the ModelReport
OutlierDetector. This detector will be in charge of looking at
activation data and figuring out whether there are significant oultiers
present in them. It will average this data across batches to make a
recommendation / warning if significant outliers are found.
This commit contains just the class framework and a base test class.
Implementations will follow in following commits.
Test Plan: python test/test_quantization.py TestFxDetectOutliers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80743
Approved by: https://github.com/HDCharles
Summary: This adds the framework (method signatures and descriptors) for
the InputWeightEqualization Detector. There is no code implemenation yet
so the test suite for this is a simple pass. This Detector will be used
to determine whether input weight equalization should be recommended.
Test Plan: python test/test_quantization.py TestFxDetectInputWeightEqualization
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79916
Approved by: https://github.com/HDCharles
Summary: per https://github.com/pytorch/pytorch/issues/79135 the code
snippets in the docs don't run. This is a recurring problem since
previously there was no unit test to check that these code snippets
actually ran. This PR adds support for such a test, importing the
snippet as a string and evaluating it to make sure that it actually runs
if the code snippet has user defined code, you can pass in dummy
versions using global_inputs. Sometimes the imports of the code snippets
behave oddly but you can pass them in as in test_quantization_doc_custom
where nnq is passed in.
Test Plan: python test/test_quantization.py TestQuantizationDocs
also see https://github.com/pytorch/pytorch/pull/79994 to see what shows up in CI when the docs get broken
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79923
Approved by: https://github.com/z-a-f, https://github.com/vspenubarthi
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.
This contains the init method and the signatures and docs for each
of the proposed helper functions.
This also address and fixes a revert issue.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80052
Approved by: https://github.com/HDCharles
Summary: The ModelReport class in model_report.py combines the
functionality of the detectors and the ModelReportObserver. It creates
an end-to-end system where a user can pass in a prepared Graph Model to
insert the ModelReportObservers, then after the user callibrates their
model, the callibrated model can then be used by the ModelReport class
to generate reports based on what the user wished to gather information
on.
This contains the init method and the signatures and docs for each
of the proposed helper functions.
Test Plan: python test/test_quantization.py TestFxModelReportClass
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79595
Approved by: https://github.com/andrewor14
Summary: The _detect_dynamic_vs_static function was added to take in a
prepared fx graph model that already had ModelReportObservers built into
it and uses the collected information to determine whether input and
output are stationary or non-stationary and provides feedback on whether
to make linear modules static or dynamic based on this information.
This PR will be followed up soon with another PR that will more
rigoursly test the whole end to end performance of this system, which is
primarily how the function in this PR will be tested for functionality,
which is why this one only has 1 test.
Test Plan: python test/quantization/fx/test_model_report_fx.py TestModelReportDetectDynamicStatic
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79326
Approved by: https://github.com/HDCharles
Summary: The purpose of this is to add to the module report functioality
by creating an observer that will take a prepared fx module and suggest
whether static or dynamic quantization is more appropriate. The tests
for this have been written and included in the location indicated by the
Test Plan
Test Plan: python test/quantization/fx/test_model_report_fx.py TestModelReportObserver
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79243
Approved by: https://github.com/jerryzh168, https://github.com/andrewor14
Summary: This code is meant to be a tool to help people get the most out
of their backend by hinting them to use per_channel quantization if it's
supported, which will help increase accuracy significantly. The code is
completed and ready to be reviewed.
Test Plan: test/quantization/fx/test_model_report_fx.py
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79104
Approved by: https://github.com/HDCharles
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs
Example Call:
```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```
Example output:
```
{
"linear1": (tensor1,),
"linear2": (tensor2,),
"sub": (tensor3,),
"sub.linear1": (tensor4,),
...
}
```
Test Plan:
python test/test_quantization.py TestUtils
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78286
Approved by: https://github.com/dzdang
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs
Example Call:
```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```
Example output:
```
{
"linear1": (tensor1,),
"linear2": (tensor2,),
"sub": (tensor3,),
"sub.linear1": (tensor4,),
...
}
```
Test Plan:
python test/test_quantization.py TestUtils
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78146
Approved by: https://github.com/vkuzo
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70246
Breaks up the large `TestQuantizeDBR` test case into
1. `TestQuantizeDBRIndividualOps` for testing functionality of ops
2. `TestQuantizeDBRMultipleOps` for testing non-fusion interactions between ops
3. `TestQuantizeDBR` for everything else
We may need to refactor this more in the future, but this should
unblock things for the near future.
Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
python test/test_quantization.py TestQuantizeDBRIndividualOps
python test/test_quantization.py TestQuantizeDBRMultipleOps
```
Reviewed By: jerryzh168
Differential Revision: D33255925
Pulled By: vkuzo
fbshipit-source-id: 82db1a644867e9303453cfedffed2d81d083c9cd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69880
Making the test cases more standardized, in general we would like to have
```
TestQuantizeEager,
TestQuantizeEagerOps,
TestQuantizeEagerModels,
```
but currently since we have separate ptq static, ptq dynamic and qat static apis, we only partially cleaned
up the test cases, we can merge all of them later when we merge all the apis
Test Plan:
python test/test_quantization.py
Imported from OSS
Reviewed By: supriyar
Differential Revision: D33081418
fbshipit-source-id: fcb96559b76bbc51eb1b0625e0d4b193dbb37532
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68228
Forking this for now so that we can make changes as we need, the changes can be merged back to torch.fx
later
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D32537713
fbshipit-source-id: 326598d13645fcc28ef2c66baaac6a077b80fd0c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68602
This PR adds support for configuring weight/bias dtype in backend_config_dict
and refactor the current code that checks when to insert observers
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D32537712
fbshipit-source-id: 28eb7c61a8dcad8c1f3f6622d490a34cff0c59e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68176
it should be noted that for the modules, reduce_range is set to
true by default in a similar fashion to linear_dynamic.
Test Plan:
python test/test_quantization.py TestDynamicQuantizedModule
python test/test_quantization.py TestDynamicQuantizedConv
python test/test_quantization.py TestQuantizedConv
Imported from OSS
Reviewed By: kimishpatel
Differential Revision: D32374003
fbshipit-source-id: 011562bd0f4d817387d53bb113df2600aa60a7a3