[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Description:
1. Quantize Linear Layer Weights to 4-bits:
Quantize the weights of the Linear layer to 4 bits, using symmetric quantization.
Pack two 4-bit weights into one uint8 container.
Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32.

2. Prepare Quantized Weights, Scales, and Optional Bias:
After quantizing, obtain the quantized_weights, scales, and groupsize.
If the original Linear layer has a bias, prepare it as well.

3. Pack the Weights Efficiently:
Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias.
```python
packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features)
```
Input parameters should include:
in_features and out_features (the same as the Linear layer’s corresponding parameters).

4. Perform Dynamic Quantized Matrix Multiplication:
Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights.
```python
output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights,  groupsize, in_features, out_features)
```
Inputs required include:
The input tensor, packed_weights , groupsize, and the in_features and out_features.

API Usage: https://github.com/pytorch/pytorch/issues/143289

Model Perf :
7B Transformer model:
Prefill : 340 t/s
Decode  : 40  t/s
2B Transformer model
Prefill : 747 t/s
Decode  : 80  t/s

Tests:
python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight
Ran 1 test in 0.016s

OK

python test/test_linalg.py -k test__dyn_quant_matmul_4bit
Ran 8 tests in 0.077s

OK

python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit
Ran 8 tests in 11.454s

Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124
Approved by: https://github.com/digantdesai, https://github.com/malfet

This commit is contained in:

Nikhil Gupta

2024-12-20 19:32:03 +00:00

committed by

PyTorch MergeBot

parent b5475d334e

commit 94737e8a2a

37 changed files with 1898 additions and 23 deletions

									
										4

buckbuild.bzl
									
												View File
												
				@ -1070,6 +1070,7 @@ def define_buck_targets(

				        ],

				    )

				    # TODO: Enable support for KleidiAI bazel build

				    # @lint-ignore BUCKLINT

				    fb_native.genrule(

				        name = "generate_aten_config",

				@ -1122,6 +1123,9 @@ def define_buck_targets(

				            "--replace",

				            "@AT_BLAS_USE_CBLAS_DOT@",

				            "AT_BLAS_USE_CBLAS_DOT_FBXPLAT",

				            "--replace",

				            "@AT_KLEIDIAI_ENABLED@",

				            "0",

				        ]),

				        outs = {

				            "Config.h": ["Config.h"],

[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)

4 buckbuild.bzl Unescape Escape View File

4

buckbuild.bzl

View File