[Docs] [V1] Update docs to remove enforce_eager limitation for hybrid models. (#21233)

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-20 14:53:52 +08:00 · 2025-07-20 01:09:58 +02:00
parent 10eb24cc91
commit 2b504eb770
1 changed files with 2 additions and 3 deletions
--- a/docs/usage/v1_guide.md
+++ b/docs/usage/v1_guide.md
@ -107,12 +107,11 @@ to enable simultaneous generation and embedding using the same engine instance i
 Models using selective state-space mechanisms instead of standard transformer attention are partially supported.
 Models that use Mamba-2 layers (e.g., `Mamba2ForCausalLM`) are supported, but models that use older Mamba-1 layers
 (e.g., `MambaForCausalLM`, `JambaForCausalLM`) are not yet supported. Please note that these models currently require
-enforcing eager mode and disabling prefix caching in V1.
+disabling prefix caching in V1.

 Models that combine Mamba-2 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`,
 `Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`). Please note that
-these models currently require enforcing eager mode, disabling prefix caching, and using the FlashInfer attention
-backend in V1.
+these models currently require disabling prefix caching and using the FlashInfer attention backend in V1.

 #### Encoder-Decoder Models