Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74276
Removing convert.py since we have rerouted the traffic to _convert_do_not_use, we'll do a rename in the follow up PR
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D34914261
fbshipit-source-id: 09ad520d95fa91c525222a69474930efb3571088
(cherry picked from commit 8aeb33206f3572132356fe78395aa3ce6aff11cd)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65033
1. Move the file:
```
hg mv caffe2/torch/quantization/fx caffe2/torch/ao/quantization/fx
hg mv caffe2/torch/quantization/quantize_fx.py caffe2/torch/ao/quantization/quantize_fx.py
```
2. Create new files
```
touch caffe2/torch/quantization/quantize_fx.py
touch caffe2/torch/quantization/fx/__init__.py
```
3. import things in the new files
4. add tests to test/quantization/ao_migration/test_quantization_fx.py
this is because we have some fx import in quantize_fx and fx/*.py
Test Plan: buck test mode/dev //caffe2/test:quantization
Reviewed By: vkuzo, z-a-f
Differential Revision: D30949749
fbshipit-source-id: 9e5d4d039c8a0a0820bc9040e224f0d2c26886d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64445
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quantize.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.
Test Plan: `buck test mode/dev //caffe2/test:quantization`
Reviewed By: HDCharles
Differential Revision: D30734870
fbshipit-source-id: dc204f3cc46bff2cc81c95159eab9d333b43bb4b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64086
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the `quantize.py` from torch.quantization to `torch.ao.quantization`.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.
Test Plan: `buck test mode/opt //caffe2/test:quantization`
Reviewed By: jerryzh168, raghuramank100
Differential Revision: D30055886
fbshipit-source-id: 8ef7470f9fa640c0042bef5bb843e7a05ecd0b9f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62861
This PR adds a lower_to_native_backend function to lower a quantized reference model
to a model that uses fbgemm/qnnpack ops. We'll gradually add support and remove
the fbgemm/qnnpack specific handling in quantization_patterns.py
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D30165828
fbshipit-source-id: de1149cd7e7c1840c17c251cd4d35004afd015b7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62698
We also removed the special handling in match_utils for binary ops
Test Plan:
python test/test_quantize.py TestQuantizeFx
python test/test_quantize.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D30093781
fbshipit-source-id: 58cc972de8211a80dd4d111e25dc4ad36057933f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61942
This PR changes is_reference=True for conv to produce a pattern consists of dequant - float conv - quant instead of reference conv module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29810656
fbshipit-source-id: 549237a62bfda4341a2a7474c124f5e33350e267
Summary:
This PR enables gpu only quantization, best used with is_reference since
there are not many gpu kernels for ops as of now.
This PR mainly changes how qconfigs and their obs constructors operate once they
on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original
qconfig, and configures them so that when invoked, the created obs will
be on whatever device the module occupies. (Once observers are created,
module.to(device) is already setup so that it moves any observers). To do this,
a new method and a few small chanegs were added to the _PartialWrapper class that
our observers already use to create constructors (without changing the
existing functionality). These changes work in
concert with changes to the prepare flow such that when the qconfigs are
propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr.
Ideally this would work on other models but the is_reference support for
a lot of modules isn't there yet, those tests should be added in a
future PR
Test Plan:
python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic
python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert
python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert
python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence
Reviewed By: vkuzo
Differential Revision: D29684114
fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59963
When converting, before quantizing the nodes, we call
`update_obs_for_equalization()` and `convert_eq_obs()`.
`update_obs_for_equalization`:
1. For each InputEqualizationObserver, we find the corresponding
WeightEqualizationObserver.
2. For nn.Linear layers, we will create an instance of the
WeightEqualizationObserver, run forward on the observer with the given
weights.
3. Calculate the equalization scale between the
InputEqualizationObserver and WeightEqualizationObserver.
`convert_eq_obs`:
For every InputEqualizationObserver, we will do the following:
1. Create a node (ex. `x0_activation_post_process_scale`) containing the
equalization scale constant.
2. Create another node containing a `mul` operator multiplying the
equalization scale and the input.
3. Remove the current InputEqualizationObserver node, and replace it
with the `mul` node.
For every WeightEqualizationObserver, we will do the following:
1. Get the next equalization scale (we may need this for equalizing
connected linear layers).
2. Scale the weights by multiplying it with the reciprocal of the
current equalization scale and the next equalization scale
Currently, this supports models with `nn.Linear` layers, but does not
support connecting linear layers.
Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`
Original Model:
```
.LinearModule(
(linear): Linear(in_features=2, out_features=2, bias=True)
)
```
Graph after `prepare_fx`:
```
graph():
%x : [#users=1] = placeholder[target=x]
%x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
%x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
%linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
%linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
return linear_activation_post_process_0
```
Graph after equalization functions:
```
graph():
%x : [#users=1] = placeholder[target=x]
%x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
%mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
%x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {})
%linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
%linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
return linear_activation_post_process_0
```
Graph after `convert_fx`:
```
graph():
%x : [#users=1] = placeholder[target=x]
%x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
%mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
%linear_input_scale_0 : [#users=1] = get_attr[target=linear_input_scale_0]
%linear_input_zero_point_0 : [#users=1] = get_attr[target=linear_input_zero_point_0]
%quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear_input_scale_0, %linear_input_zero_point_0, torch.quint8), kwargs = {})
%linear : [#users=1] = call_module[target=linear](args = (%quantize_per_tensor,), kwargs = {})
%dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {})
return dequantize
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29135358
fbshipit-source-id: 2d00056729041318463de61841483490b6bfeee5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60054
Previously env in convert is Dict[str, Tuple[Node, torch.dtype]], that is, at a given time each node can only have one dtype,
this causes a problem for the following case:
```
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(1, 1, 1)
def forward(self, x):
x = self.conv(x)
x1 = x.expand_as(x)
x2 = torch.add(x, x1)
return x2
def forward(self, x):
x = self.activation_post_process_0(x)
x = self.conv(x)
x = self.activation_post_process_1(x)
x1 = x.expand_as(x)
x1 = self.activation_post_process_2(x1)
x2 = torch.add(x, x1)
x2 = self.activation_post_process_3(x2)
return x2
def forward(self, x):
x = torch.quantize_per_tensor(x, ...)
x = self.conv(x). # quantized conv
x = torch.dequantize(x)
x1 = x.expand_as(x)
x1 = torch.quantize_per_tensor(x1, ...)
# Error: x is dequantized
x2 = torch.ops.quantized.add(x, x1)
return x2
Currently we have a env that is a map from node name of the observed graph to the Node in the quantized graph, here the problem is that following a quantized operator conv, we have two operators, one is expecting float input (expand_as), the other is expecting quantized input (quantized add), and in the quantized graph, ideally, expand_as should consume the dequantized output, and quantized add should consume the quantized output:
quantized_conv - dequantize - expand_as
\ ------- quantized_add
But currently in env, each node needs to either be quantized or not quantized. Therefore we will need to change env to include dtype as well:
env: Dict[str, Dict[dtype, Node]], e.g. {‘x’: {torch.float: dequantized_node, torch.quint8: quantized_node}}
And when we load from the env, we will need to provide the dtype of the Node that we want to load as well. We can have a separate pass to figure out this information for each node.
```
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29149408
fbshipit-source-id: c9e4b7d65444ab6a6f573929bae1db5037629892
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.
With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006
Reviewed By: jbschlosser, malfet
Differential Revision: D29133237
Pulled By: albanD
fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a