From 1ae1cdd8e4f6aae1ab915683dd5ecdda946d010a Mon Sep 17 00:00:00 2001
From: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Date: Mon, 6 Oct 2025 20:10:32 -0700
Subject: [PATCH] Clarify document of leaf module config (#7623)

Update document of leaf module config as suggested
[here](https://github.com/deepspeedai/DeepSpeed/pull/7604#discussion_r2407483616).

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
---
 docs/code-docs/source/training.rst | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/docs/code-docs/source/training.rst b/docs/code-docs/source/training.rst
index 8af502397..76716a95c 100644
--- a/docs/code-docs/source/training.rst
+++ b/docs/code-docs/source/training.rst
@@ -116,15 +116,7 @@ Configuration in DeepSpeed config
 
 The same behavior can be controlled from the DeepSpeed config. Add a
 ``leaf_module`` block to ``zero_optimization`` specifying either classes,
-module names, or name suffixes (or any combination). By default DeepSpeed marks
-several Hugging Face MoE blocks—including Mixtral and Qwen MoE sparse blocks so
-that they behave well with ZeRO3.
-
-The default class list currently contains:
-
-* ``transformers.models.mixtral.modeling_mixtral.MixtralSparseMoeBlock``
-* ``transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeSparseMoeBlock``
-* ``transformers.models.qwen3_moe.modeling_qwen3_moe.Qwen3MoeSparseMoeBlock``
+module names, or name suffixes (or any combination). While the example below shows three different ways (``classes``, ``names``, and ``name_suffixes``) to specify modules as leaf modules, typically you will use just one of these.
 
 .. code-block:: json
 
@@ -150,6 +142,12 @@ accepted.
 You can mix and match the API and configuration approaches; all referenced
 modules are flagged before ZeRO installs its hooks.
 
+By default DeepSpeed marks several Hugging Face MoE blocks—including Mixtral and Qwen MoE sparse blocks so that they behave well with ZeRO3. The default class list currently contains:
+
+* ``transformers.models.mixtral.modeling_mixtral.MixtralSparseMoeBlock``
+* ``transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeSparseMoeBlock``
+* ``transformers.models.qwen3_moe.modeling_qwen3_moe.Qwen3MoeSparseMoeBlock``
+
 
 Model Saving
 ------------