Clarify document of leaf module config (#7623)

Update document of leaf module config as suggested [here](https://github.com/deepspeedai/DeepSpeed/pull/7604#discussion_r2407483616). Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
2025-10-20 15:33:51 +08:00 · 2025-10-06 20:10:32 -07:00
parent 2b68bbc594
commit 1ae1cdd8e4
1 changed files with 7 additions and 9 deletions
--- a/docs/code-docs/source/training.rst
+++ b/docs/code-docs/source/training.rst
@ -116,15 +116,7 @@ Configuration in DeepSpeed config

 The same behavior can be controlled from the DeepSpeed config. Add a
 ``leaf_module`` block to ``zero_optimization`` specifying either classes,
-module names, or name suffixes (or any combination). By default DeepSpeed marks
-several Hugging Face MoE blocks—including Mixtral and Qwen MoE sparse blocks so
-that they behave well with ZeRO3.
-
-The default class list currently contains:
-
-* ``transformers.models.mixtral.modeling_mixtral.MixtralSparseMoeBlock``
-* ``transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeSparseMoeBlock``
-* ``transformers.models.qwen3_moe.modeling_qwen3_moe.Qwen3MoeSparseMoeBlock``
+module names, or name suffixes (or any combination). While the example below shows three different ways (``classes``, ``names``, and ``name_suffixes``) to specify modules as leaf modules, typically you will use just one of these.

 .. code-block:: json

@ -150,6 +142,12 @@ accepted.
 You can mix and match the API and configuration approaches; all referenced
 modules are flagged before ZeRO installs its hooks.

+By default DeepSpeed marks several Hugging Face MoE blocks—including Mixtral and Qwen MoE sparse blocks so that they behave well with ZeRO3. The default class list currently contains:
+
+* ``transformers.models.mixtral.modeling_mixtral.MixtralSparseMoeBlock``
+* ``transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeSparseMoeBlock``
+* ``transformers.models.qwen3_moe.modeling_qwen3_moe.Qwen3MoeSparseMoeBlock``
+

 Model Saving
 ------------