137 Commits

Author SHA1 Message Date
41d3071918 [NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-16 16:20:25 -07:00
44c8555621 [CI/Build] Fix AMD import failures in CI (#26841)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-10-16 07:28:20 +00:00
04b5f9802d [CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 (#26722)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-14 10:52:05 -07:00
27ed39a347 [XPU] Upgrade NIXL to remove CUDA dependency (#26570)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2025-10-11 05:15:23 +00:00
8f8474fbe3 [CI/Build] Fix ppc64le CPU build and tests (#22443)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
2025-10-11 13:04:42 +08:00
c9d33c60dc [UX] Add FlashInfer as default CUDA dependency (#26443)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-09 14:10:02 -07:00
5e49c3e777 Bump Flashinfer to v0.4.0 (#26326)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-10-08 23:58:44 -07:00
0d7c3cb51d Update Dockerfile and install runai-model-streamer[gcs] package (#26464)
Signed-off-by: Peter Schuurman <psch@google.com>
2025-10-08 23:48:51 -07:00
e09d1753ec Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 (#26416)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-08 10:40:42 -07:00
8229280a9c [Misc] Define EP kernel arch list in Dockerfile (#25635)
Signed-off-by: Simon Mo <simon.mo@hey.com>
2025-10-07 00:05:33 +00:00
27edd2aeb4 [Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv (#26103)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-10-02 22:21:01 -07:00
d00d652998 [CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-02 10:04:57 -07:00
d4e7a1152d Update base image to 22.04 (jammy) (#26065)
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-10-02 05:48:04 -07:00
aac622e0cd [ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series (#25908)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
2025-10-01 21:39:49 +00:00
bc546f76a1 [CI] Move applicable tests to CPU (#24080)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-30 14:45:20 +01:00
9360d34fa1 update to latest deepgemm for dsv3.2 (#25871)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-09-29 17:51:43 +08:00
5546acb463 [Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location (#25766)
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
2025-09-27 13:36:28 -04:00
d346ec695e [CI/Build] Consolidate model loader tests and requirements (#25765)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-26 21:45:20 -07:00
92da847cf5 Add flashinfer-build.sh and register precompiled cu128 wheel in Dockerfile (#25782)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-26 18:54:09 -07:00
cf89202855 [CI] Fix FlashInfer AOT in release docker image (#25730)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-26 14:11:40 -07:00
eb32335e35 [CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata (#25652)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-25 13:29:11 +00:00
302eb941f3 [ROCm][Build][Bugfix] Fix ROCm base docker whls installation order (#25415)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-24 11:25:10 -04:00
dc34059360 [ROCm][CI/Build] Use ROCm7.0 as the base (#25178)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-18 09:36:55 -07:00
64ad551878 Removes source compilation of nixl dependency (#24874)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com>
2025-09-17 01:33:18 +00:00
0af3ce1355 Upgrade flashinfer to 0.3.1 (#24470)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-16 02:36:09 +00:00
fd2f10546c [ci] fix wheel names for arm wheels (#24898)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-09-15 14:39:08 -07:00
94b03f88dd Bump Flashinfer to 0.3.1 (#24868)
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-09-15 12:45:55 -07:00
2f5e5c18de [CI/Build] bump timm dependency (#24189)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-09-10 06:20:59 -07:00
73e688cb79 [ROCm][Feature] Enable Pipeline Parallelism with Ray Compiled Graph on ROCm (#24275)
Signed-off-by: charlifu <charlifu@amd.com>
2025-09-09 23:27:35 +00:00
b9a1c4c8a2 [ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork (#24279)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-09 12:21:56 -04:00
e10fef0883 [Hardware][IBM Z] Fix Outlines Core issue for s390x (#24034)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2025-09-08 16:50:34 -07:00
67841317d1 [xpu] upgrade ipex/python3.12 for xpu (#23830)
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-09-08 02:07:16 +00:00
4172235ab7 [V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-06 16:15:18 -07:00
78336a0c3e Upgrade FlashInfer to v0.3.0 (#24086)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-09-04 09:49:20 -07:00
402759d472 [Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-04 02:47:59 -07:00
4ba0c587ba FIX: Add libnuma-dev to Dockerfile for dev stage (#20388)
Signed-off-by: dongbo910220 <1275604947@qq.com>
2025-09-03 07:17:20 -07:00
dc1a53186d [Kernel] Update DeepGEMM to latest commit (#23915)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-01 02:38:04 -07:00
ae067888d6 Update Flashinfer to 0.2.14.post1 (#23537)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-25 18:30:44 -07:00
f6818a92cb [UX] Move Dockerfile DeepGEMM install to tools/install_deepgemm.sh (#23360)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-22 20:52:50 -06:00
0483fabc74 [CI/Build] add EP dependencies to docker (#21976)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-08-22 13:34:40 -07:00
8896eb72eb [Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed (#18800)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-22 10:56:57 +08:00
8ef6b8a38c Always use cache mounts when installing vllm to avoid populating pip cache in the image. Also remove apt cache. (#23270)
Signed-off-by: Valentyn Tymofieiev <valentyn@google.com>
2025-08-21 18:01:03 -04:00
50df09fe13 Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image (#23129)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-20 08:05:54 -04:00
78dba404ad [Hardware][IBM Z]Enable v1 for s390x and s390x dockerfile fixes (#22725)
Signed-off-by: Nikhil Suryawanshi <suryawanshin74@gmail.com>
2025-08-19 04:40:37 +00:00
76144adf76 ci: Add CUDA + arm64 release builds (#21201)
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
2025-08-15 23:16:23 +00:00
e8b40c7fa2 [CI] Remove duplicated docs build from buildkite (#22924)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-15 05:58:06 -07:00
ba81acbdc1 [Bugfix] Bump DeepGEMM Version to Fix SMXX Layout Issues (#22606)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
2025-08-12 15:43:06 -07:00
dc5e4a653c Upgrade FlashInfer to v0.2.11 (#22613)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-08-11 19:58:41 -07:00
d1af8b7be9 enable Docker-aware precompiled wheel setup (#22106)
Signed-off-by: dougbtv <dosmith@redhat.com>
2025-08-10 16:29:02 -07:00
81c57f60a2 [XPU] upgrade torch 2.8 on for XPU (#22300)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-08 17:03:45 -07:00