mirror of
https://github.com/deepspeedai/DeepSpeed.git
synced 2025-10-20 15:33:51 +08:00
[docs] fix broken read-the-docs build (#2075)
This commit is contained in:
@ -20,7 +20,7 @@ copyright = '2020, Microsoft'
|
||||
author = 'Microsoft'
|
||||
|
||||
# The full version, including alpha/beta/rc tags
|
||||
release = '0.3.0'
|
||||
release = '0.6'
|
||||
|
||||
master_doc = 'index'
|
||||
|
||||
|
@ -7,9 +7,9 @@ API To Estimate Memory Usage
|
||||
|
||||
ZeRO2:
|
||||
|
||||
.. autofunction:: deepspeed.runtime.zero.stage2.estimate_zero2_model_states_mem_needs_all_live
|
||||
.. autofunction:: deepspeed.runtime.zero.stage_1_and_2.estimate_zero2_model_states_mem_needs_all_live
|
||||
|
||||
.. autofunction:: deepspeed.runtime.zero.stage2.estimate_zero2_model_states_mem_needs_all_cold
|
||||
.. autofunction:: deepspeed.runtime.zero.stage_1_and_2.estimate_zero2_model_states_mem_needs_all_cold
|
||||
|
||||
Examples:
|
||||
|
||||
@ -18,7 +18,7 @@ Let's try a 3B model with just 1 node with 8 gpus, using live model:
|
||||
.. code-block:: bash
|
||||
|
||||
python -c 'from transformers import AutoModel; \
|
||||
from deepspeed.runtime.zero.stage2 import estimate_zero2_model_states_mem_needs_all_live; \
|
||||
from deepspeed.runtime.zero.stage_1_and_2 import estimate_zero2_model_states_mem_needs_all_live; \
|
||||
model = AutoModel.from_pretrained("t5-3b"); \
|
||||
estimate_zero2_model_states_mem_needs_all_live(model, num_gpus_per_node=8, num_nodes=1)'
|
||||
Estimated memory needed for params, optim states and gradients for a:
|
||||
@ -34,7 +34,7 @@ faster as we don't need to load the model.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -c 'from deepspeed.runtime.zero.stage2 import estimate_zero2_model_states_mem_needs_all_cold; \
|
||||
python -c 'from deepspeed.runtime.zero.stage_1_and_2 import estimate_zero2_model_states_mem_needs_all_cold; \
|
||||
estimate_zero2_model_states_mem_needs_all_cold(total_params=2851e6, num_gpus_per_node=8, num_nodes=1)'
|
||||
Estimated memory needed for params, optim states and gradients for a:
|
||||
HW: Setup with 1 node, 8 GPUs per node.
|
||||
|
@ -1,5 +1,8 @@
|
||||
docutils<0.18
|
||||
hjson
|
||||
packaging
|
||||
psutil
|
||||
py-cpuinfo
|
||||
pydantic
|
||||
torch
|
||||
tqdm
|
||||
|
Reference in New Issue
Block a user