A high-throughput and memory-efficient inference and serving engine for LLMs
amd
blackwell
cuda
deepseek
deepseek-v3
gpt
gpt-oss
inference
kimi
llama
llm
llm-serving
model-serving
moe
openai
pytorch
qwen
qwen3
tpu
transformer
Updated 2025-10-20 03:47:19 +08:00
Community maintained hardware plugin for vLLM on Ascend
Updated 2025-10-19 17:06:05 +08:00
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
agent
ai
chatglm
fine-tuning
gpt
instruction-tuning
language-model
large-language-models
llama
llama3
llm
lora
mistral
moe
peft
qlora
quantization
qwen
rlhf
transformers
Updated 2025-10-18 18:02:14 +08:00
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Updated 2025-10-17 22:24:46 +08:00
A high-throughput and memory-efficient inference and serving engine for LLMs
amd
cuda
deepseek
gpt
hpu
inference
inferentia
llama
llm
llm-serving
llmops
mlops
model-serving
pytorch
qwen
rocm
tpu
trainium
transformer
xpu
Updated 2025-10-11 16:48:30 +08:00