Files
DeepSpeed/docs/_posts/2023-09-12-ZeRO-Inference.md
Olatunji Ruwase fd40516923 Update GH org references (#6998)
Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Co-authored-by: Fabien Dupont <fabiendupont@fabiendupont.fr>
2025-02-05 00:56:50 +00:00

296 B

title: "ZeRO-Inference: 20X faster inference through weight quantization and KV cache offloading" excerpt: "" link: https://github.com/deepspeedai/DeepSpeedExamples/blob/master/inference/huggingface/zero_inference/README.md date: 2023-09-12 00:09:00 tags: inference ZeRO quantization English