[Quant] lower fused LinearTanh for onednn backend (#89188)

**Summary**
Add fuser method and quantization mappings for `QLinearLeakyReLU` for int8 inference for onednn backend. The fusion and lowering are supported only in FX mode.

**Test plan**
python test_quantization.py TestFuseFx TestQuantizeFx

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89188
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
This commit is contained in:
Xia, Weiwen
2022-12-18 10:49:24 +08:00
committed by PyTorch MergeBot
parent 666d218055
commit a5eb564ba4
7 changed files with 167 additions and 63 deletions

View File

@ -111,6 +111,7 @@ DEFAULT_STATIC_QUANT_MODULE_MAPPINGS : Dict[Callable, Any] = {
nni.ConvReLU3d: nniq.ConvReLU3d,
nni.LinearReLU: nniq.LinearReLU,
nni.LinearLeakyReLU: nniq.LinearLeakyReLU,
nni.LinearTanh: nniq.LinearTanh,
nniqat.ConvBn1d: nnq.Conv1d,
nniqat.ConvBn2d: nnq.Conv2d,
nniqat.ConvBn3d: nnq.Conv3d,