1879 Commits

Author SHA1 Message Date
84d86e3767 [numeric_debugger] guard the input generate_numeric_debug_handle as GraphModule type (#140742)
Summary: Support ExportProgram type in generate_numeric_debug_handle, to better meet the requirement

Test Plan: ci

Differential Revision: D65920529

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140742
Approved by: https://github.com/tarun292, https://github.com/jerryzh168
2024-11-20 03:40:04 +00:00
efe8482c0d Add prepare_obs_or_fq_callback to quantizer (#140863)
Test Plan: CI.

Differential Revision: D65982003

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140863
Approved by: https://github.com/jerryzh168
2024-11-19 01:13:38 +00:00
62eea62493 [Quant][Onednn] add linear_dynamic_fp16 ops (#140376)
**About this PR**
This PR adds the following ops for `linear_dynamic_fp16` in onednn namespace. These ops are intended for PT2E quantization eager mode.
- `onednn::linear_prepack_fp16`: packs fp32 weight to an fp16 MkldnnCPU tensor.
- `onednn::linear_dynamic_fp16`: takes an fp32 CPU tensor and an fp16 MkldnnCPU tensor and compute linear in fp32
- `onednn::linear_relu_dynamic_fp16`: similar as the former and apply relu on output.

**Test plan**
`python test/test_quantization.py -k test_linear_dynamic_fp16_onednn`

**Implementation**
These ops call oneDNN lib under the hood. It's worth noting that oneDNN does not support f32 * f16 -> f32 computation, so we have to convert fp16 weight to fp32 before computation. And weight is still in plain format after packing.

**Correctness and performance**
Correctness is guaranteed by UT.
Performance of the new ops may be better than the FBGEMM implementation when weight shape is small but worse when weight shape is large. It's because weight dtype conversion and computation are not fused.
For example, I ran benchmarks on an Intel(R) Xeon(R) Platinum 8490H machine with different cores and shapes. When using 1 core per instance, the new implementation generally is faster for weight shape < 1024 * 1024. When using more cores, the threshold will increase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140376
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
2024-11-14 05:19:18 +00:00
cb71bcc542 Replace clone.detach with detach.clone (#140264)
Fixes #64532

As state in issue, replace `clone.detach` by `detach.clone`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264
Approved by: https://github.com/soulitzer
2024-11-13 07:01:02 +00:00
59ec011855 [numerical debugger] bumped up the starting handler id (#139666)
Differential Revision: D65445250

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139666
Approved by: https://github.com/tarun292, https://github.com/dulinriley
2024-11-07 01:00:43 +00:00
938803df94 Add bfloat16 support for per tensor/channel cpu/cuda fake quantize ops (#139306)
Summary: Fixes https://fb.workplace.com/groups/2240361332735959/permalink/8190736677698365

Test Plan:
buck2 test 'fbcode//mode/dev' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_channel_cachemask_cpu (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)'

buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_tensor_cachemask_cpu (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)'

buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_channel_cachemask_cuda (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)'

buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_forward_per_channel_cachemask_cpu (caffe2.test.quantization.core.test_workflow_ops.TestFakeQuantizeOps)'

Differential Revision: D65221710

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139306
Approved by: https://github.com/navsud
2024-10-31 20:41:15 +00:00
d7411c0cc1 [AOTI] add C shim for QConvPointWise (#138540)
This PR adds C shim for `QConvPointWisePT2E` and `QConvPointWiseBinaryPT2E` similar to https://github.com/pytorch/pytorch/pull/138439. Besides that, we aligned the implementation of `qconv_pointwise` with `qlinear_pointwise` in the following aspects:
1. The parameter order of `qconv_pointwise` and `qlinear_pointwise` are quite different, we aligned the schema of `qconv_pointwise` to have similar parameter order as `qlinear_pointwise` to make it more consistent.
2. We always converted `x_scale` and `x_zero_point` to Tensors, just like in the lowering of `qlinear_pointwise`. This avoids the need to create two separate C APIs (one for `double x_scale` and `int64_t x_zero_point`, and another for `Tensor` versions). Instead, we only need one API for `Tensor`-based `x_scale` and `x_zero_point`. If we later add dynamic quantization for qconv (which will use `Tensor` for `x_scale` and `x_zero_point`), we can reuse the code from this PR and don't need to change the C shim layer API.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138540
Approved by: https://github.com/jgong5, https://github.com/desertfire
ghstack dependencies: #138691, #138806
2024-10-31 02:03:01 +00:00
edcab61f93 Skip test for PT2E quantized ops in fbcode (#138792)
Skip those tests as they are failing in fbcode.
Submit this PR per request from @jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138792
Approved by: https://github.com/jerryzh168
2024-10-30 02:37:38 +00:00
69d401d010 Update test_quantize_pt2e.py with HPU support (#137863)
**MOTIVATION**

We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at https://github.com/pytorch/pytorch/pull/126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.

**CHANGES**
- Add support for HPU devices within the test_move_exported_model_bn using TEST_HPU flag
- Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances.
- Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137863
Approved by: https://github.com/jerryzh168
2024-10-29 13:01:03 +00:00
5d074746e9 [BE]: Add better optional typing (#138426)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138426
Approved by: https://github.com/XuehaiPan, https://github.com/malfet
2024-10-27 14:19:00 +00:00
043864afdf enable test_x86inductor_quantizer.py UTs on Windows. (#138937)
This UTs are failed months ago, but due to the main branch move forward, some PRs fixed it. Let's turn on them.

Local test passed:
<img width="863" alt="image" src="https://github.com/user-attachments/assets/a2ec160c-cdf1-404d-bc24-2f60faa8d791">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138937
Approved by: https://github.com/jansel
2024-10-26 12:48:51 +00:00
6d8c9be54b [reland] Add int1 to int7 dtypes (#137928)
Summary:
Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137928
Approved by: https://github.com/malfet
2024-10-18 02:02:08 +00:00
ad134fe038 Skip doc test internally (#137813)
Summary:
there are some path issues when we run the doc tests internally

https://www.internalfb.com/intern/test/281475143872621

Test Plan: sandcastle

Reviewed By: drisspg, msaroufim

Differential Revision: D64255824

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137813
Approved by: https://github.com/HDCharles
2024-10-14 21:29:15 +00:00
c0a930b104 Change to export_for_training in quantize_pt2e tests (#137233)
Summary:
as title

also change it in `prepare_pt2e()` docstring

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:quantization_pt2e_qat

buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization
```

Differential Revision: D63345059

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137233
Approved by: https://github.com/tugsbayasgalan
2024-10-04 18:33:02 +00:00
4096ed7dc2 Migrate to training ir in quantization_pt2e_qat unittests (#137232)
Summary: Change capture_pre_autograd_graph to export_for_training in unit tests.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:quantization_pt2e_qat
```

Reviewed By: tugsbayasgalan

Differential Revision: D63336660

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137232
Approved by: https://github.com/angelayi
2024-10-03 22:57:04 +00:00
c83178d894 Change to export_for_training in XNNPACK tests (#137238)
Summary: as title

Test Plan: CI

Differential Revision: D63344674

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137238
Approved by: https://github.com/tugsbayasgalan
2024-10-03 21:28:05 +00:00
a3f3773477 Make PT2E work with both IR simultaneously (#135769)
Summary: as title

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:quantization_pt2e_qat
```

Differential Revision: D62449830

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135769
Approved by: https://github.com/angelayi
2024-10-02 21:05:22 +00:00
2ef1454189 Revert "Add int1 to int7 dtypes (#136301)"
This reverts commit bfa16a161d5089a9ba008f5e665f29b58dc16526.

Reverted https://github.com/pytorch/pytorch/pull/136301 on behalf of https://github.com/PaliC due to causing internal failures ([comment](https://github.com/pytorch/pytorch/pull/136301#issuecomment-2384119600))
2024-09-30 20:50:49 +00:00
8df97d78c2 [QAT] Make Fused modules torchscriptable (#136285)
Summary:
Same as title.

Inspired by: https://pytorch.org/tutorials/recipes/script_optimized.html#fix-common-errors-when-using-the-script-method

Test Plan: CI

Differential Revision: D62980019

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136285
Approved by: https://github.com/jerryzh168
2024-09-28 03:46:19 +00:00
bfa16a161d Add int1 to int7 dtypes (#136301)
Summary:
Similar to https://github.com/pytorch/pytorch/pull/117208, we want to add int1 to int7 for edge use cases
for weight quantization (https://www.internalfb.com/diff/D62464487)

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136301
Approved by: https://github.com/ezyang
2024-09-28 02:08:33 +00:00
a28b40fa74 Improve is_fbcode functionality (#136871)
Summary: Previously is_fbcode just checked whether the checkout was git or not. This is extremely error prone. Lets make it fool-proof.

Test Plan: unit tests

Reviewed By: masnesral

Differential Revision: D63545169

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136871
Approved by: https://github.com/masnesral
2024-09-27 21:19:01 +00:00
797c7e2802 [Quant][PT2E]change flatten recipe for X86InductorQuantizer (#136298)
This PR modifies the flatten recipe: if none of the users of the flatten node are quantizable ops, int8 flatten will be disabled to avoid unnecessary dtype conversions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136298
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5
2024-09-24 04:30:12 +00:00
3be150653c [torch][ao] Add customizable loss function to NodeAccuracySummary (#136282)
Summary:
Add a customizable loss function callback to NodeAccuracySummary to
allow users to pass in their own loss function.

Also, fix some type errors and propagate better exception messages when
unexpected tensor comparisons occur. Finally, enhance the robustness of
`generate_numeric_debug_handle` in the case where it is called multiple
times on the same model, by avoiding reuse of the same IDs.

Test Plan: Added a test for this case in `test_numeric_debugger`.

Reviewed By: jerryzh168

Differential Revision: D62898297

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136282
Approved by: https://github.com/jerryzh168
2024-09-24 03:28:12 +00:00
df1eef9779 Revert "[torch][ao] Add customizable loss function to NodeAccuracySummary (#136282)"
This reverts commit f3c54ccf8f6139807f4623037c0174964a286652.

Reverted https://github.com/pytorch/pytorch/pull/136282 on behalf of https://github.com/huydhn due to This breaks OSS, let revert it and land the revert internally then ([comment](https://github.com/pytorch/pytorch/pull/136282#issuecomment-2364219252))
2024-09-20 17:49:06 +00:00
f3c54ccf8f [torch][ao] Add customizable loss function to NodeAccuracySummary (#136282)
Summary:
Add a customizable loss function callback to NodeAccuracySummary to
allow users to pass in their own loss function.

Also, fix some type errors and propagate better exception messages when
unexpected tensor comparisons occur. Finally, enhance the robustness of
`generate_numeric_debug_handle` in the case where it is called multiple
times on the same model, by avoiding reuse of the same IDs.

Test Plan: Added a test for this case in `test_numeric_debugger`.

Reviewed By: jerryzh168

Differential Revision: D62898297

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136282
Approved by: https://github.com/jerryzh168
2024-09-20 07:34:52 +00:00
f2b0fc89f2 Add uint16 support for observer (#136238)
Summary:
att

Test Plan:
python test/test_quantization.py -k TestObserver

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D62909821](https://our.internmc.facebook.com/intern/diff/D62909821)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136238
Approved by: https://github.com/tarun292
2024-09-18 23:52:18 +00:00
5c5c33ac32 [Dynamo] Trace torch function modes entered outside of torch.compile (#133137)
This PR adds initial tracing for torch function modes.

Details:
In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call.
This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers.

Previously landed:
https://github.com/pytorch/pytorch/pull/133135
https://github.com/pytorch/pytorch/pull/133136
https://github.com/pytorch/pytorch/pull/133134
https://github.com/pytorch/pytorch/pull/133133
https://github.com/pytorch/pytorch/pull/133132
https://github.com/pytorch/pytorch/pull/133131
https://github.com/pytorch/pytorch/pull/133729
https://github.com/pytorch/pytorch/pull/133130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #134732
2024-09-14 18:52:22 +00:00
8c8a3086a7 Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137)"
This reverts commit 4528777e034b157a8329d1879daf52290eea199a.

Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))
2024-09-14 10:02:55 +00:00
4528777e03 [Dynamo] Trace torch function modes entered outside of torch.compile (#133137)
This PR adds initial tracing for torch function modes.

Details:
In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call.
This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers.

Previously landed:
https://github.com/pytorch/pytorch/pull/133135
https://github.com/pytorch/pytorch/pull/133136
https://github.com/pytorch/pytorch/pull/133134
https://github.com/pytorch/pytorch/pull/133133
https://github.com/pytorch/pytorch/pull/133132
https://github.com/pytorch/pytorch/pull/133131
https://github.com/pytorch/pytorch/pull/133729
https://github.com/pytorch/pytorch/pull/133130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #134732
2024-09-14 02:40:43 +00:00
b8eef500a6 Fix attr check for quantization spec (#135736)
Summary:
Previously we only checked dtype and is_dynamic to decide if two quantization spec are equivalent
this may not work in some cases, e.g. when people use different qscheme or quant_min/quant_max

This PR added checks for other fields as well

Test Plan:
regression tests

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D62530974](https://our.internmc.facebook.com/intern/diff/D62530974)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135736
Approved by: https://github.com/sxu
2024-09-13 23:01:22 +00:00
4312794b92 [reland][export] fix re-export custom metadata (#135720)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/134778

The previous D62304294 broke some executorch tests. It has already been reverted.

In this diff, `_collect_param_buffer_metadata()` is modified in a way that when a `call_function` node is encountered and its input nodes include `get_attr`. We skip the fields that have been collected previously and only collect rest of the fields. This prevents over-writing.

Test Plan:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops

buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_re_export_preserve_handle

buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_run_decompositions_preserve_handle
```

Differential Revision: D62514208

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135720
Approved by: https://github.com/zhxchen17, https://github.com/jerryzh168
2024-09-13 20:15:15 +00:00
eb7dd91dd1 Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137)"
This reverts commit fafdd588f27e1d56090c6d260d0382c255eaf9eb.

Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/albanD due to Broke tests on main ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2348886378))
2024-09-13 12:52:58 +00:00
fafdd588f2 [Dynamo] Trace torch function modes entered outside of torch.compile (#133137)
This PR adds initial tracing for torch function modes.

Details:
In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call.
This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers.

Previously landed:
https://github.com/pytorch/pytorch/pull/133135
https://github.com/pytorch/pytorch/pull/133136
https://github.com/pytorch/pytorch/pull/133134
https://github.com/pytorch/pytorch/pull/133133
https://github.com/pytorch/pytorch/pull/133132
https://github.com/pytorch/pytorch/pull/133131
https://github.com/pytorch/pytorch/pull/133729
https://github.com/pytorch/pytorch/pull/133130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #134732
2024-09-13 08:41:00 +00:00
1a74952925 "Remove BLOCK_LIST" (#135729)
Summary:
Skip test_prepare_qat_conv_bn_fusion_getitem_placeholder when we use training ir, since it's only for bn-getitem pattern, but the pattern doesn't exist in training ir.

Remove BLOCK_LIST since it's empty.
Now all internal unittests will use training ir.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan'  caffe2/test/quantization:test_quantization -- -r test_prepare_qat_conv_bn_fusion_getitem_placeholder
buck2 run 'fbcode//mode/dev-nosan'  caffe2/test:quantization_pt2e_qat -- -r test_prepare_qat_conv_bn_fusion_getitem_placeholder
```

Differential Revision: D62387987

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135729
Approved by: https://github.com/tugsbayasgalan
2024-09-12 01:22:06 +00:00
ad75b09d89 Replace capture_pre_autograd_graph with export_for_training in torch tests (#135623)
Summary: as title

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r test_conv_dynamic
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r matcher
 buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r x86
```

CI

Differential Revision: D62448302

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135623
Approved by: https://github.com/tugsbayasgalan
2024-09-11 19:23:08 +00:00
183c32fd3b Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137)"
This reverts commit 0d15122092c27fec1143b800bab7c996d126b547.

Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/clee2000 due to something in this stack broke functorch/test_control_flow.py::TestControlFlow::test_scan_simple_graph [GH job link](https://github.com/pytorch/pytorch/actions/runs/10804912306/job/29980571390) [HUD commit link](444b52ff40), newly added test yesterday ([comment](https://github.com/pytorch/pytorch/pull/133137#issuecomment-2344054339))
2024-09-11 15:57:00 +00:00
4ae6d7c18f Back out "[pytorch][PR] [export] fix re-export custom metadata" (#135634)
Summary: Broke some tests. Revert this diff

Test Plan: CI

Differential Revision: D62474337

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135634
Approved by: https://github.com/tugsbayasgalan
2024-09-11 06:16:26 +00:00
0d15122092 [Dynamo] Trace torch function modes entered outside of torch.compile (#133137)
This PR adds initial tracing for torch function modes.

Details:
In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call.
This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers.

Previously landed:
https://github.com/pytorch/pytorch/pull/133135
https://github.com/pytorch/pytorch/pull/133136
https://github.com/pytorch/pytorch/pull/133134
https://github.com/pytorch/pytorch/pull/133133
https://github.com/pytorch/pytorch/pull/133132
https://github.com/pytorch/pytorch/pull/133131
https://github.com/pytorch/pytorch/pull/133729
https://github.com/pytorch/pytorch/pull/133130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #134732
2024-09-11 04:18:22 +00:00
66c45f3ed9 [export] fix re-export custom metadata (#135282)
Fixes #134778

When a model is exported and debug handles are added to the "custom" field of non-placeholder and non-output nodes in the graph, re-exporting it will change the metadata of placeholder nodes (the "custom" field will be added or copied to these nodes, depending whether `ExportedProgram` or `ExportedProgram.module()` is passed to `generate_numeric_debug_handle()`).

This occurs because when we re-export the model, `placeholder` nodes are unlifted to `get_attr` nodes. These nodes remain as `get_attr` after being exported to `gm_torch_level`.  Their metadata are modified [here](https://github.com/pytorch/pytorch/blob/main/torch/export/_trace.py#L1347) based on `params_buffers_to_node_meta` which is collected [here](https://github.com/pytorch/pytorch/blob/main/torch/export/_trace.py#L1312).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135282
Approved by: https://github.com/jerryzh168, https://github.com/zhxchen17, https://github.com/tugsbayasgalan
2024-09-10 20:15:02 +00:00
fd494dd426 Change wrapped_linear_prepack and wrapped_quantized_linear_prepacked to private by adding _ as prefix (#135401)
Summary: In https://github.com/pytorch/pytorch/pull/134232, we added two new ops wrapped_linear_prepack and wrapped_quantized_linear_prepacked. From the review comments and offline discussion, we are changing them to private by adding `_` as prefix

Differential Revision: D62325142

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135401
Approved by: https://github.com/houseroad
2024-09-08 04:16:24 +00:00
c92227c41a [quant][pt2e] fix placeholder typo and related quantization tests (#135379)
A previous typo on "placeholder" and related tests in quantization are fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135379
Approved by: https://github.com/jerryzh168
2024-09-07 02:31:43 +00:00
b1a934741e Change test_constant_prop_preserve_metadata (#135268)
Summary: In new export_for_training, "stack_trace" does not exist in node meta anymore.

Test Plan:
```
buck run fbcode//mode/dev-nosan fbcode//caffe2/test:quantization_pt2e -- -r test_constant_prop_preserve_metadata
```

Reviewed By: angelayi

Differential Revision: D62219974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135268
Approved by: https://github.com/angelayi
2024-09-07 00:02:35 +00:00
590a3e9f8a [export][training ir migration] quantized_decomposed.quantize_per_tensor decomposition (#134525)
Summary:
In graph of  TestXNNPACKQuantizer.test_dynamic_linear_with_con test, some quantized_decomposed.quantize_per_tensor.default ops are becoming quantized_decomposed.dequantize_per_tensor.tensor ops when using the new training ir.

This is because we lift params/buffers before calling make_fx. So previously, for the graph that’s passed to make_fx,`graph.L__self___linear1.weight` is a tensor
now in training ir, graph.L__self___linear1.weight is a FakeTensor. This caused the node overload to be different.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_dynamic_linear_with_conv
```

Differential Revision: D61364547

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134525
Approved by: https://github.com/tugsbayasgalan, https://github.com/jerryzh168
2024-09-06 07:06:06 +00:00
bc5ecf83d7 [training ir migration] Fix quantization tests (#135184)
Summary:
Fixed some quantization tests for new training ir:

Fix batch norm node pattern matcher. In training ir, we have `aten.batch_norm` node instead of `aten._native_batch_norm_legit` and `aten._native_batch_norm_legit_no_training`.

Test Plan:
```
buck run fbcode//mode/dev-nosan fbcode//caffe2/test:quantization_pt2e
```

Reviewed By: tugsbayasgalan

Differential Revision: D62209819

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135184
Approved by: https://github.com/tugsbayasgalan
2024-09-05 21:19:28 +00:00
0043dcd79e Switch torch pt2e xnnpack tests to use export_for_training (#134788)
Migrate all the callsites inside the pt2e XNNPACK tests to use export_for_training.

Differential Revision: D61994553

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134788
Approved by: https://github.com/mergennachin
2024-09-05 16:11:18 +00:00
3ef4c27ab3 Update pt2e numeric debugger to use node.meta["custom"] field (#134040)
Summary:
With https://github.com/pytorch/pytorch/pull/131912 we now have a "custom" field in node.meta that can be preserved
in

* copy/deepcopy
* run_decompositions()
* serialization
* re-exporting

So we refactored numeric debugger to use this.

Test Plan:
python test/test_quantization.py TestNumericDebugger

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134040
Approved by: https://github.com/tarun292
2024-08-27 19:51:03 +00:00
311af3b988 Add new ops wrapped_linear_prepack and wrapped_quantized_linear_prepacked (#134232)
Summary:
This diff adds two new operators torch.ops._quantized.wrapped_linear_prepack and torch.ops._quantized.wrapped_quantized_linear_prepacked. It is a decomposition of the op torch.ops._quantized.wrapped_quantized_linear added in the previous diff.

We decomposed in this way as packed weight could be computed early so we don;t need to do it in every forward in AOTI

Reviewed By: jerryzh168

Differential Revision: D61395887

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134232
Approved by: https://github.com/houseroad
2024-08-23 04:54:26 +00:00
b0cf287b46 [export][training ir migration] Fix getitem not exist (#134259)
Summary:
Make quantization tests compatible with the new training IR.

With the new batch norm node `torch.ops.aten.batch_norm.default`, we don't need an additional getitem node after the bn node, so tests need to be fixed to not check for the getitem node.

We added a capture_pre_autograd_graph_using_training_ir() function, which returns True when we are using the training ir, and False otherwise. This way, the code supports both training ir and the old ir.

For now, we are just rolling out the training ir for fbcode internal tests.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_preserve_source_fn_stack
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_update_shared_qspec
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_relu_fusion

buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion_literal_args
```

Reviewed By: andrewor14, tugsbayasgalan

Differential Revision: D61292102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134259
Approved by: https://github.com/tugsbayasgalan
2024-08-22 22:00:14 +00:00
3d8db41337 Add new op wrapped_quantized_linear (#134024)
Summary:
This diff adds a new operator wrapped_quantized_linear (torch.ops._quantized.wrapped_quantized_linear) and takes the following input argument: input (in fp32) , input_scale, input_zero_point, weight (in fp32), weight_scale, weight_zero_point, bias (in fp32), output_scale, output_zero_point, and out_channel. It does the following

1. Use quantize_per_tensor(input, input_scale, input_zero_point) to quantize the input tensor to int8
2. Use quantized::linear_prepack(weight, weight_scale, weight_zero_point, bias) to pack the weight and bias
3. Use quantized::linear to perform int8 quantized linear
4. dequantize

This new op is essentially a wrapper of mutiple ops. We do this as torch.export cannot handle models where it has old quantize apis.

Reviewed By: jerryzh168

Differential Revision: D61377266

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134024
Approved by: https://github.com/houseroad
2024-08-21 09:26:58 +00:00
99e789b52b [Fix 1/n] GPU Test skips - fbcode/ caffe2/test/quantization (#133158)
Summary:
This diff aims to fix the GPU Test skips in the quantization tests under the `caffe2/test/quantization` directory. The changes made in the `TARGETS` files include adding the `should_use_remote_gpu` flag to enable remote GPU testing. This should help to resolve the skipped tests and improve the overall test coverage.

[This diff] Fixed skip count: 4
[Running total] Fixed skip count: 4

Note: Creating separate diffs for each test-group.

Test Plan:
**281475054644766**: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_compare_per_channel_device_numerics (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)'
https://www.internalfb.com/intern/testinfra/testrun/5629499773981783

**281475054644780**: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_compare_per_tensor_device_numerics (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)'
https://www.internalfb.com/intern/testinfra/testrun/11540474087422107

**281475054644853**: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_quant_pin_memory (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)'
https://www.internalfb.com/intern/testinfra/testrun/11540474087422477

**844425008078016**: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_cuda_quantization_does_not_pin_memory (caffe2.test.quantization.core.test_quantized_tensor.TestQuantizedTensor)'
https://www.internalfb.com/intern/testinfra/testrun/1407375259845199

Differential Revision: D60055277

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133158
Approved by: https://github.com/jovianjaison
2024-08-16 22:00:57 +00:00