mirror of
https://github.com/deepspeedai/DeepSpeed.git
synced 2025-10-20 15:33:51 +08:00
Clarify document of leaf module config (#7623)
Update document of leaf module config as suggested [here](https://github.com/deepspeedai/DeepSpeed/pull/7604#discussion_r2407483616). Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
This commit is contained in:
@ -116,15 +116,7 @@ Configuration in DeepSpeed config
|
||||
|
||||
The same behavior can be controlled from the DeepSpeed config. Add a
|
||||
``leaf_module`` block to ``zero_optimization`` specifying either classes,
|
||||
module names, or name suffixes (or any combination). By default DeepSpeed marks
|
||||
several Hugging Face MoE blocks—including Mixtral and Qwen MoE sparse blocks so
|
||||
that they behave well with ZeRO3.
|
||||
|
||||
The default class list currently contains:
|
||||
|
||||
* ``transformers.models.mixtral.modeling_mixtral.MixtralSparseMoeBlock``
|
||||
* ``transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeSparseMoeBlock``
|
||||
* ``transformers.models.qwen3_moe.modeling_qwen3_moe.Qwen3MoeSparseMoeBlock``
|
||||
module names, or name suffixes (or any combination). While the example below shows three different ways (``classes``, ``names``, and ``name_suffixes``) to specify modules as leaf modules, typically you will use just one of these.
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
@ -150,6 +142,12 @@ accepted.
|
||||
You can mix and match the API and configuration approaches; all referenced
|
||||
modules are flagged before ZeRO installs its hooks.
|
||||
|
||||
By default DeepSpeed marks several Hugging Face MoE blocks—including Mixtral and Qwen MoE sparse blocks so that they behave well with ZeRO3. The default class list currently contains:
|
||||
|
||||
* ``transformers.models.mixtral.modeling_mixtral.MixtralSparseMoeBlock``
|
||||
* ``transformers.models.qwen2_moe.modeling_qwen2_moe.Qwen2MoeSparseMoeBlock``
|
||||
* ``transformers.models.qwen3_moe.modeling_qwen3_moe.Qwen3MoeSparseMoeBlock``
|
||||
|
||||
|
||||
Model Saving
|
||||
------------
|
||||
|
Reference in New Issue
Block a user