This website requires JavaScript.
Explore
Help
Register
Sign In
frozenleaves
/
vllm-dev
Watch
1
Star
0
Fork
0
You've already forked vllm-dev
Code
Issues
Packages
Projects
Releases
Wiki
Activity
1
Commit
129
Branches
75
Tags
e7d9d9c08c79b386f6d0477e87b77a572390317d
Go to file
Code
Clone
HTTPS
Tea CLI
Open with VS Code
Open with VSCodium
Open with Intellij IDEA
Download ZIP
Download TAR.GZ
Download BUNDLE
Woosuk Kwon
e7d9d9c08c
Initial commit
2023-02-09 11:24:15 +00:00
README.md
Initial commit
2023-02-09 11:24:15 +00:00
README.md
CacheFlow
Description
A high-throughput and memory-efficient inference and serving engine for LLMs
amd
cuda
deepseek
gpt
hpu
inference
inferentia
llama
llm
llm-serving
llmops
mlops
model-serving
pytorch
qwen
rocm
tpu
trainium
transformer
xpu
Readme
Apache-2.0
771
MiB
Languages
Python
86%
Cuda
8%
C++
4.5%
Shell
0.7%
C
0.4%
Other
0.4%