[Docs] [V1] [Hybrid] Update docs to remove FlashInfer constraint for hybrid models (#23665)

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
This commit is contained in:
Thomas Parnell
2025-08-26 20:33:16 +02:00
committed by GitHub
parent 730d0ac8b9
commit 227e231b55

View File

@ -111,11 +111,10 @@ Models that use Mamba-2 and Mamba-1 layers (e.g., `Mamba2ForCausalLM`, `MambaFor
Models that combine Mamba-2 and Mamba-1 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`,
`Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`, `JambaForCausalLM`). Please note that
these models currently require disabling prefix caching and using the FlashInfer attention backend in V1.
these models currently require disabling prefix caching in V1.
Hybrid models with mechanisms different to Mamba are also supported (e.g, `MiniMaxText01ForCausalLM`, `MiniMaxM1ForCausalLM`).
Please note that these models currently require disabling prefix caching, enforcing eager mode, and using the FlashInfer
attention backend in V1.
Please note that these models currently require disabling prefix caching and enforcing eager mode in V1.
#### Encoder-Decoder Models