|
41d3071918
|
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-16 16:20:25 -07:00 |
|
|
44c8555621
|
[CI/Build] Fix AMD import failures in CI (#26841)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-10-16 07:28:20 +00:00 |
|
|
04b5f9802d
|
[CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 (#26722)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-14 10:52:05 -07:00 |
|
|
27ed39a347
|
[XPU] Upgrade NIXL to remove CUDA dependency (#26570)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2025-10-11 05:15:23 +00:00 |
|
|
8f8474fbe3
|
[CI/Build] Fix ppc64le CPU build and tests (#22443)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
|
2025-10-11 13:04:42 +08:00 |
|
|
c9d33c60dc
|
[UX] Add FlashInfer as default CUDA dependency (#26443)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-09 14:10:02 -07:00 |
|
|
5e49c3e777
|
Bump Flashinfer to v0.4.0 (#26326)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-10-08 23:58:44 -07:00 |
|
|
0d7c3cb51d
|
Update Dockerfile and install runai-model-streamer[gcs] package (#26464)
Signed-off-by: Peter Schuurman <psch@google.com>
|
2025-10-08 23:48:51 -07:00 |
|
|
e09d1753ec
|
Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 (#26416)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-08 10:40:42 -07:00 |
|
|
8229280a9c
|
[Misc] Define EP kernel arch list in Dockerfile (#25635)
Signed-off-by: Simon Mo <simon.mo@hey.com>
|
2025-10-07 00:05:33 +00:00 |
|
|
27edd2aeb4
|
[Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv (#26103)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-10-02 22:21:01 -07:00 |
|
|
d00d652998
|
[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-02 10:04:57 -07:00 |
|
|
d4e7a1152d
|
Update base image to 22.04 (jammy) (#26065)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-10-02 05:48:04 -07:00 |
|
|
aac622e0cd
|
[ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series (#25908)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
|
2025-10-01 21:39:49 +00:00 |
|
|
bc546f76a1
|
[CI] Move applicable tests to CPU (#24080)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-30 14:45:20 +01:00 |
|
|
9360d34fa1
|
update to latest deepgemm for dsv3.2 (#25871)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-29 17:51:43 +08:00 |
|
|
5546acb463
|
[Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location (#25766)
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
|
2025-09-27 13:36:28 -04:00 |
|
|
d346ec695e
|
[CI/Build] Consolidate model loader tests and requirements (#25765)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-26 21:45:20 -07:00 |
|
|
92da847cf5
|
Add flashinfer-build.sh and register precompiled cu128 wheel in Dockerfile (#25782)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-26 18:54:09 -07:00 |
|
|
cf89202855
|
[CI] Fix FlashInfer AOT in release docker image (#25730)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-26 14:11:40 -07:00 |
|
|
eb32335e35
|
[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata (#25652)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-25 13:29:11 +00:00 |
|
|
302eb941f3
|
[ROCm][Build][Bugfix] Fix ROCm base docker whls installation order (#25415)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-24 11:25:10 -04:00 |
|
|
dc34059360
|
[ROCm][CI/Build] Use ROCm7.0 as the base (#25178)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-18 09:36:55 -07:00 |
|
|
64ad551878
|
Removes source compilation of nixl dependency (#24874)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com>
|
2025-09-17 01:33:18 +00:00 |
|
|
0af3ce1355
|
Upgrade flashinfer to 0.3.1 (#24470)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-16 02:36:09 +00:00 |
|
|
fd2f10546c
|
[ci] fix wheel names for arm wheels (#24898)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-15 14:39:08 -07:00 |
|
|
94b03f88dd
|
Bump Flashinfer to 0.3.1 (#24868)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2025-09-15 12:45:55 -07:00 |
|
|
2f5e5c18de
|
[CI/Build] bump timm dependency (#24189)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-10 06:20:59 -07:00 |
|
|
73e688cb79
|
[ROCm][Feature] Enable Pipeline Parallelism with Ray Compiled Graph on ROCm (#24275)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-09-09 23:27:35 +00:00 |
|
|
b9a1c4c8a2
|
[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork (#24279)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-09 12:21:56 -04:00 |
|
|
e10fef0883
|
[Hardware][IBM Z] Fix Outlines Core issue for s390x (#24034)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2025-09-08 16:50:34 -07:00 |
|
|
67841317d1
|
[xpu] upgrade ipex/python3.12 for xpu (#23830)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2025-09-08 02:07:16 +00:00 |
|
|
4172235ab7
|
[V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-06 16:15:18 -07:00 |
|
|
78336a0c3e
|
Upgrade FlashInfer to v0.3.0 (#24086)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-09-04 09:49:20 -07:00 |
|
|
402759d472
|
[Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-04 02:47:59 -07:00 |
|
|
4ba0c587ba
|
FIX: Add libnuma-dev to Dockerfile for dev stage (#20388)
Signed-off-by: dongbo910220 <1275604947@qq.com>
|
2025-09-03 07:17:20 -07:00 |
|
|
dc1a53186d
|
[Kernel] Update DeepGEMM to latest commit (#23915)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-01 02:38:04 -07:00 |
|
|
ae067888d6
|
Update Flashinfer to 0.2.14.post1 (#23537)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-25 18:30:44 -07:00 |
|
|
f6818a92cb
|
[UX] Move Dockerfile DeepGEMM install to tools/install_deepgemm.sh (#23360)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-22 20:52:50 -06:00 |
|
|
0483fabc74
|
[CI/Build] add EP dependencies to docker (#21976)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-22 13:34:40 -07:00 |
|
|
8896eb72eb
|
[Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed (#18800)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-22 10:56:57 +08:00 |
|
|
8ef6b8a38c
|
Always use cache mounts when installing vllm to avoid populating pip cache in the image. Also remove apt cache. (#23270)
Signed-off-by: Valentyn Tymofieiev <valentyn@google.com>
|
2025-08-21 18:01:03 -04:00 |
|
|
50df09fe13
|
Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image (#23129)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-20 08:05:54 -04:00 |
|
|
78dba404ad
|
[Hardware][IBM Z]Enable v1 for s390x and s390x dockerfile fixes (#22725)
Signed-off-by: Nikhil Suryawanshi <suryawanshin74@gmail.com>
|
2025-08-19 04:40:37 +00:00 |
|
|
76144adf76
|
ci: Add CUDA + arm64 release builds (#21201)
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
|
2025-08-15 23:16:23 +00:00 |
|
|
e8b40c7fa2
|
[CI] Remove duplicated docs build from buildkite (#22924)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-15 05:58:06 -07:00 |
|
|
ba81acbdc1
|
[Bugfix] Bump DeepGEMM Version to Fix SMXX Layout Issues (#22606)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
|
2025-08-12 15:43:06 -07:00 |
|
|
dc5e4a653c
|
Upgrade FlashInfer to v0.2.11 (#22613)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-08-11 19:58:41 -07:00 |
|
|
d1af8b7be9
|
enable Docker-aware precompiled wheel setup (#22106)
Signed-off-by: dougbtv <dosmith@redhat.com>
|
2025-08-10 16:29:02 -07:00 |
|
|
81c57f60a2
|
[XPU] upgrade torch 2.8 on for XPU (#22300)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-08-08 17:03:45 -07:00 |
|