|
74f441f4b5
|
[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059)
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-08-15 10:01:39 -04:00 |
|
|
3597b06a4f
|
[CUDA] Enable full cudagraph for FlashMLA (#18581)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-13 18:12:26 +00:00 |
|
|
eaa2e51088
|
[Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-06-08 08:56:12 +08:00 |
|
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
|
f8d2cc5f55
|
[Compile][Platform] Make PiecewiseBackend pluggable and extendable (#18076)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-05-22 12:11:53 -07:00 |
|