Polish DDP join API docstrings (#43973)

Summary:
Polishes DDP join api docstrings and makes a few minor cosmetic changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43973

Reviewed By: zou3519

Differential Revision: D23467238

Pulled By: rohan-varma

fbshipit-source-id: faf0ee56585fca5cc16f6891ea88032336b3be56
This commit is contained in:
Rohan Varma
2020-09-03 13:37:40 -07:00
committed by Facebook GitHub Bot
parent 442684cb25
commit 3806c939bd

View File

@ -231,6 +231,7 @@ class DistributedDataParallel(Module):
parameters. parameters.
Example:: Example::
>>> import torch.distributed.autograd as dist_autograd >>> import torch.distributed.autograd as dist_autograd
>>> from torch.nn.parallel import DistributedDataParallel as DDP >>> from torch.nn.parallel import DistributedDataParallel as DDP
>>> from torch import optim >>> from torch import optim
@ -688,7 +689,7 @@ class DistributedDataParallel(Module):
def join(self, divide_by_initial_world_size=True, enable=True): def join(self, divide_by_initial_world_size=True, enable=True):
r""" r"""
A context manager to be used in conjunction with an instance of A context manager to be used in conjunction with an instance of
:class:`torch.nn.parallel.distributed.DistributedDataParallel` to be :class:`torch.nn.parallel.DistributedDataParallel` to be
able to train with uneven inputs across participating processes. able to train with uneven inputs across participating processes.
This context manager will keep track of already-joined DDP processes, This context manager will keep track of already-joined DDP processes,
@ -710,10 +711,10 @@ class DistributedDataParallel(Module):
.. warning:: .. warning::
This module works only with the multi-process, single-device usage This module works only with the multi-process, single-device usage
of :class:`torch.nn.parallel.distributed.DistributedDataParallel`, of :class:`torch.nn.parallel.DistributedDataParallel`,
which means that a single process works on a single GPU. which means that a single process works on a single GPU.
..warning:: .. warning::
This module currently does not support custom distributed collective This module currently does not support custom distributed collective
operations in the forward pass, such as ``SyncBatchNorm`` or other operations in the forward pass, such as ``SyncBatchNorm`` or other
custom defined collectives in the model's forward pass. custom defined collectives in the model's forward pass.
@ -731,7 +732,7 @@ class DistributedDataParallel(Module):
``world_size`` even when we encounter uneven inputs. If you set ``world_size`` even when we encounter uneven inputs. If you set
this to ``False``, we divide the gradient by the remaining this to ``False``, we divide the gradient by the remaining
number of nodes. This ensures parity with training on a smaller number of nodes. This ensures parity with training on a smaller
world_size although it also means the uneven inputs would ``world_size`` although it also means the uneven inputs would
contribute more towards the global gradient. Typically, you contribute more towards the global gradient. Typically, you
would want to set this to ``True`` for cases where the last few would want to set this to ``True`` for cases where the last few
inputs of your training job are uneven. In extreme cases, where inputs of your training job are uneven. In extreme cases, where
@ -744,6 +745,7 @@ class DistributedDataParallel(Module):
Example:: Example::
>>> import torch >>> import torch
>>> import torch.distributed as dist >>> import torch.distributed as dist
>>> import os >>> import os