frozenleaves/vllm

mirror of https://github.com/vllm-project/vllm.git synced 2025-10-20 23:03:52 +08:00

Files

Nir David 01513a334a Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) (#12010 )

Signed-off-by: Nir David <ndavid@habana.ai>
Signed-off-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>

2025-07-16 15:33:41 -04:00

551 B

Raw Blame History

Quantization

Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.

Contents: