mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 12:54:11 +08:00
Polish DDP join API docstrings (#43973)
Summary: Polishes DDP join api docstrings and makes a few minor cosmetic changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43973 Reviewed By: zou3519 Differential Revision: D23467238 Pulled By: rohan-varma fbshipit-source-id: faf0ee56585fca5cc16f6891ea88032336b3be56
This commit is contained in:
committed by
Facebook GitHub Bot
parent
442684cb25
commit
3806c939bd
@ -231,6 +231,7 @@ class DistributedDataParallel(Module):
|
||||
parameters.
|
||||
|
||||
Example::
|
||||
|
||||
>>> import torch.distributed.autograd as dist_autograd
|
||||
>>> from torch.nn.parallel import DistributedDataParallel as DDP
|
||||
>>> from torch import optim
|
||||
@ -688,7 +689,7 @@ class DistributedDataParallel(Module):
|
||||
def join(self, divide_by_initial_world_size=True, enable=True):
|
||||
r"""
|
||||
A context manager to be used in conjunction with an instance of
|
||||
:class:`torch.nn.parallel.distributed.DistributedDataParallel` to be
|
||||
:class:`torch.nn.parallel.DistributedDataParallel` to be
|
||||
able to train with uneven inputs across participating processes.
|
||||
|
||||
This context manager will keep track of already-joined DDP processes,
|
||||
@ -710,10 +711,10 @@ class DistributedDataParallel(Module):
|
||||
|
||||
.. warning::
|
||||
This module works only with the multi-process, single-device usage
|
||||
of :class:`torch.nn.parallel.distributed.DistributedDataParallel`,
|
||||
of :class:`torch.nn.parallel.DistributedDataParallel`,
|
||||
which means that a single process works on a single GPU.
|
||||
|
||||
..warning::
|
||||
.. warning::
|
||||
This module currently does not support custom distributed collective
|
||||
operations in the forward pass, such as ``SyncBatchNorm`` or other
|
||||
custom defined collectives in the model's forward pass.
|
||||
@ -731,7 +732,7 @@ class DistributedDataParallel(Module):
|
||||
``world_size`` even when we encounter uneven inputs. If you set
|
||||
this to ``False``, we divide the gradient by the remaining
|
||||
number of nodes. This ensures parity with training on a smaller
|
||||
world_size although it also means the uneven inputs would
|
||||
``world_size`` although it also means the uneven inputs would
|
||||
contribute more towards the global gradient. Typically, you
|
||||
would want to set this to ``True`` for cases where the last few
|
||||
inputs of your training job are uneven. In extreme cases, where
|
||||
@ -744,6 +745,7 @@ class DistributedDataParallel(Module):
|
||||
|
||||
|
||||
Example::
|
||||
|
||||
>>> import torch
|
||||
>>> import torch.distributed as dist
|
||||
>>> import os
|
||||
|
Reference in New Issue
Block a user