[Doc] Update notes for H2O-VL and Gemma3 (#17219)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung
2025-05-06 15:51:02 +08:00
committed by GitHub
parent dc47ba32f8
commit 63ced7b43f

View File

@ -1118,11 +1118,6 @@ See [this page](#generative-models) for more information on how to use generativ
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
<sup>+</sup> Multiple items can be inputted per text prompt for this modality.
:::{important}
Pan-and-scan image pre-processing is currently supported on V0 (but not V1).
You can enable it by passing `--mm-processor-kwargs '{"do_pan_and_scan": true}'`.
:::
:::{warning}
Both V0 and V1 support `Gemma3ForConditionalGeneration` for text-only inputs.
However, there are differences in how they handle text + image inputs:
@ -1142,7 +1137,7 @@ This limitation exists because the model's mixed attention pattern (bidirectiona
:::
:::{note}
`h2oai/h2ovl-mississippi-2b` will be available in V1 once we support backends other than FlashAttention.
`h2oai/h2ovl-mississippi-2b` will be available in V1 once we support head size 80.
:::
:::{note}