From 1ae1cdd8e4f6aae1ab915683dd5ecdda946d010a Mon Sep 17 00:00:00 2001 From: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Date: Mon, 6 Oct 2025 20:10:32 -0700 Subject: [PATCH] Clarify document of leaf module config (#7623) Update document of leaf module config as suggested [here](https://github.com/deepspeedai/DeepSpeed/pull/7604#discussion_r2407483616). Signed-off-by: Masahiro Tanaka --- docs/code-docs/source/training.rst | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/docs/code-docs/source/training.rst b/docs/code-docs/source/training.rst index 8af502397..76716a95c 100644 --- a/docs/code-docs/source/training.rst +++ b/docs/code-docs/source/training.rst @@ -116,15 +116,7 @@ Configuration in DeepSpeed config The same behavior can be controlled from the DeepSpeed config. Add a ``leaf_module`` block to ``zero_optimization`` specifying either classes, -module names, or name suffixes (or any combination). By default DeepSpeed marks -several Hugging Face MoE blocks—including Mixtral and Qwen MoE sparse blocks so -that they behave well with ZeRO3. - -The default class list currently contains: - -* ``transformers.models.mixtral.modeling_mixtral.MixtralSparseMoeBlock`` -* ``transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeSparseMoeBlock`` -* ``transformers.models.qwen3_moe.modeling_qwen3_moe.Qwen3MoeSparseMoeBlock`` +module names, or name suffixes (or any combination). While the example below shows three different ways (``classes``, ``names``, and ``name_suffixes``) to specify modules as leaf modules, typically you will use just one of these. .. code-block:: json @@ -150,6 +142,12 @@ accepted. You can mix and match the API and configuration approaches; all referenced modules are flagged before ZeRO installs its hooks. +By default DeepSpeed marks several Hugging Face MoE blocks—including Mixtral and Qwen MoE sparse blocks so that they behave well with ZeRO3. The default class list currently contains: + +* ``transformers.models.mixtral.modeling_mixtral.MixtralSparseMoeBlock`` +* ``transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeSparseMoeBlock`` +* ``transformers.models.qwen3_moe.modeling_qwen3_moe.Qwen3MoeSparseMoeBlock`` + Model Saving ------------