A high-throughput and memory-efficient inference and serving engine for LLMs
Updated 2025-10-20 14:31:03 +08:00
Community maintained hardware plugin for vLLM on Ascend
Updated 2025-10-20 09:50:44 +08:00
A high-throughput and memory-efficient inference and serving engine for LLMs
Updated 2025-10-11 16:48:30 +08:00