Clarify document of leaf module config (#7623)

Update document of leaf module config as suggested
[here](https://github.com/deepspeedai/DeepSpeed/pull/7604#discussion_r2407483616).

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
This commit is contained in:
Masahiro Tanaka
2025-10-06 20:10:32 -07:00
committed by GitHub
parent 2b68bbc594
commit 1ae1cdd8e4

View File

@ -116,15 +116,7 @@ Configuration in DeepSpeed config
The same behavior can be controlled from the DeepSpeed config. Add a
``leaf_module`` block to ``zero_optimization`` specifying either classes,
module names, or name suffixes (or any combination). By default DeepSpeed marks
several Hugging Face MoE blocks—including Mixtral and Qwen MoE sparse blocks so
that they behave well with ZeRO3.
The default class list currently contains:
* ``transformers.models.mixtral.modeling_mixtral.MixtralSparseMoeBlock``
* ``transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeSparseMoeBlock``
* ``transformers.models.qwen3_moe.modeling_qwen3_moe.Qwen3MoeSparseMoeBlock``
module names, or name suffixes (or any combination). While the example below shows three different ways (``classes``, ``names``, and ``name_suffixes``) to specify modules as leaf modules, typically you will use just one of these.
.. code-block:: json
@ -150,6 +142,12 @@ accepted.
You can mix and match the API and configuration approaches; all referenced
modules are flagged before ZeRO installs its hooks.
By default DeepSpeed marks several Hugging Face MoE blocks—including Mixtral and Qwen MoE sparse blocks so that they behave well with ZeRO3. The default class list currently contains:
* ``transformers.models.mixtral.modeling_mixtral.MixtralSparseMoeBlock``
* ``transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeSparseMoeBlock``
* ``transformers.models.qwen3_moe.modeling_qwen3_moe.Qwen3MoeSparseMoeBlock``
Model Saving
------------