[export][training ir migration] quantized_decomposed.quantize_per_tensor decomposition (#134525)

Summary:
In graph of  TestXNNPACKQuantizer.test_dynamic_linear_with_con test, some quantized_decomposed.quantize_per_tensor.default ops are becoming quantized_decomposed.dequantize_per_tensor.tensor ops when using the new training ir.

This is because we lift params/buffers before calling make_fx. So previously, for the graph that’s passed to make_fx,`graph.L__self___linear1.weight` is a tensor
now in training ir, graph.L__self___linear1.weight is a FakeTensor. This caused the node overload to be different.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_dynamic_linear_with_conv
```

Differential Revision: D61364547

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134525
Approved by: https://github.com/tugsbayasgalan, https://github.com/jerryzh168
This commit is contained in:
Shangdi Yu
2024-09-06 07:06:06 +00:00
committed by PyTorch MergeBot
parent 764ee6e3f9
commit 590a3e9f8a
2 changed files with 21 additions and 0 deletions

View File

@ -1247,6 +1247,7 @@ class PT2EQuantizationTestCase(QuantizationTestCase):
export_with_dynamic_shape=False,
is_qat=False,
is_debug_mode=False,
capture_pre_autograd_graph_node_occurrence=None,
):
# resetting dynamo cache
torch._dynamo.reset()
@ -1305,6 +1306,10 @@ class PT2EQuantizationTestCase(QuantizationTestCase):
for k, v in PT2EQuantizationTestCase._MAP_TO_FX_TRACED_OPS.items():
if k in expected_node_occurrence:
node_occurrence[ns.call_function(v)] = expected_node_occurrence[k]
if capture_pre_autograd_graph_node_occurrence is not None:
node_occurrence = {
ns.call_function(k): v for k, v in capture_pre_autograd_graph_node_occurrence.items()
}
self.checkGraphModuleNodes(m_fx, expected_node_occurrence=node_occurrence)
fx_quant_output = m_fx(*example_inputs)
self.assertEqual(fx_quant_output, pt2_quant_output)