187 Commits

Author SHA1 Message Date
6e01e8c1c8 [CI] Add Buildkite (#2355) 2024-01-14 12:37:58 -08:00
1b7c791d60 [ROCm] Fixes for GPTQ on ROCm (#2180) 2023-12-18 10:41:04 -08:00
2acd76f346 [ROCm] Temporarily remove GPTQ ROCm support (#2138) 2023-12-15 17:13:58 -08:00
0fbfc4b81b Add GPTQ support (#916) 2023-12-15 03:04:22 -08:00
6ccc0bfffb Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00
c8e7eb1eb3 fix typo in getenv call (#1972) 2023-12-07 16:04:41 -08:00
24f60a54f4 [Docker] Adding number of nvcc_threads during build as envar (#1893) 2023-12-07 11:00:32 -08:00
e0c6f556e8 [Build] Avoid building too many extensions (#1624) 2023-11-23 16:31:19 -08:00
5ffc0d13a2 Migrate linter from pylint to ruff (#1665) 2023-11-20 11:58:01 -08:00
fd58b73a40 Build CUDA11.8 wheels for release (#1596) 2023-11-09 03:52:29 -08:00
9cabcb7645 Add Dockerfile (#1350) 2023-10-31 12:36:47 -07:00
79a30912b8 Add py.typed so consumers of vLLM can get type checking (#1509)
* Add py.typed so consumers of vLLM can get type checking

* Update py.typed

---------
Co-authored-by: aarnphm <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-10-30 14:50:47 -07:00
1f24755bf8 Support SqueezeLLM (#1326)
Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2023-10-21 23:14:59 -07:00
d0740dff1b Fix error message on TORCH_CUDA_ARCH_LIST (#1239)
Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>
2023-10-14 14:47:43 -07:00
cf5cb1e33e Allocate more shared memory to attention kernel (#1154) 2023-09-26 22:27:13 -07:00
a425bd9a9a [Setup] Enable TORCH_CUDA_ARCH_LIST for selecting target GPUs (#1074) 2023-09-26 10:21:08 -07:00
e3e79e9e8a Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00
d6770d1f23 Update setup.py (#1006) 2023-09-10 23:42:45 -07:00
a41c20435e Add compute capability 8.9 to default targets (#829) 2023-08-23 07:28:38 +09:00
65fc1c3127 set default coompute capability according to cuda version (#773) 2023-08-21 16:05:44 -07:00
2b7d3aca2e Update setup.py (#282)
Co-authored-by: neubig <neubig@gmail.com>
2023-06-27 14:34:23 -07:00
570fb2e9cc [PyPI] Fix package info in setup.py (#158) 2023-06-19 18:05:01 -07:00
dcda03b4cb Write README and front page of doc (#147) 2023-06-18 03:19:38 -07:00
0b98ba15c7 Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
e38074b1e6 Support FP32 (#141) 2023-06-07 00:40:21 -07:00
376725ce74 [PyPI] Packaging for PyPI distribution (#140) 2023-06-05 20:03:14 -07:00
d721168449 Improve setup script & Add a guard for bfloat16 kernels (#130) 2023-05-27 00:59:32 -07:00
7addca5935 Specify python package dependencies in requirements.txt (#78) 2023-05-07 16:30:43 -07:00
e070829ae8 Support bfloat16 data type (#54) 2023-05-03 14:09:44 -07:00
436e523bf1 Refactor attention kernels (#53) 2023-05-03 13:40:13 -07:00
897cb2ae28 Optimize data movement (#20) 2023-04-02 00:30:17 -07:00
09e9245478 Add custom kernel for RMS normalization (#16) 2023-04-01 00:51:22 +08:00
88c0268a18 Implement custom kernel for LLaMA rotary embedding (#14) 2023-03-30 11:04:21 -07:00
0deacbce6e Implement single_query_cached_kv_attention kernel (#3) 2023-03-01 15:02:19 -08:00
ffad4e1e03 cache_kernel -> cache_kernels 2023-02-16 20:05:45 +00:00
6f058c7ba8 Implement cache ops 2023-02-16 07:47:03 +00:00
3be29a1104 Add blank setup file 2023-02-09 11:37:06 +00:00