mirror of
https://github.com/vllm-project/vllm.git
synced 2025-10-20 14:53:52 +08:00
[Docs] [V1] [Hybrid] Update docs to remove FlashInfer constraint for hybrid models (#23665)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
This commit is contained in:
@ -111,11 +111,10 @@ Models that use Mamba-2 and Mamba-1 layers (e.g., `Mamba2ForCausalLM`, `MambaFor
|
||||
|
||||
Models that combine Mamba-2 and Mamba-1 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`,
|
||||
`Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`, `JambaForCausalLM`). Please note that
|
||||
these models currently require disabling prefix caching and using the FlashInfer attention backend in V1.
|
||||
these models currently require disabling prefix caching in V1.
|
||||
|
||||
Hybrid models with mechanisms different to Mamba are also supported (e.g, `MiniMaxText01ForCausalLM`, `MiniMaxM1ForCausalLM`).
|
||||
Please note that these models currently require disabling prefix caching, enforcing eager mode, and using the FlashInfer
|
||||
attention backend in V1.
|
||||
Please note that these models currently require disabling prefix caching and enforcing eager mode in V1.
|
||||
|
||||
#### Encoder-Decoder Models
|
||||
|
||||
|
Reference in New Issue
Block a user