Description
A high-throughput and memory-efficient inference and serving engine for LLMs
amdcudadeepseekgpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchqwenrocmtputrainiumtransformerxpu
Readme
Apache-2.0
771 MiB
Languages
Python
86%
Cuda
8%
C++
4.5%
Shell
0.7%
C
0.4%
Other
0.4%