[doc] fix autograd doc subsubsection display issue (#40582)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40582

There's a misuse in the `requires_grad` with ~~~~~~, "~~~~" is not a official section marker, change it to "^^^^^" to denote subsubsections, also fix the other places where we should use subsection "-----" instead of subsubsection "^^^^"

see https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections

Before:
<img width="712" alt="rst_before" src="https://user-images.githubusercontent.com/9443650/85789835-2226fa80-b6e4-11ea-97b6-2b19fdf324a4.png">
After:
<img width="922" alt="rst_after" src="https://user-images.githubusercontent.com/9443650/85789856-281cdb80-b6e4-11ea-925f-cb3f4ebaa2bf.png">

Test Plan: Imported from OSS

Differential Revision: D22245747

Pulled By: wanchaol

fbshipit-source-id: 11548ed42f627706863bb74d4269827d1b3450d4
This commit is contained in:
Wanchao Liang
2020-06-25 23:23:21 -07:00
committed by Facebook GitHub Bot
parent 3ab60ff696
commit eebd492dcf

View File

@ -11,7 +11,7 @@ programs, and can aid you in debugging.
.. _excluding-subgraphs:
Excluding subgraphs from backward
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
---------------------------------
Every Tensor has a flag: :attr:`requires_grad` that allows for fine grained
exclusion of subgraphs from gradient computation and can increase efficiency.
@ -19,7 +19,7 @@ exclusion of subgraphs from gradient computation and can increase efficiency.
.. _excluding-requires_grad:
``requires_grad``
~~~~~~~~~~~~~~~~~
^^^^^^^^^^^^^^^^^
If there's a single input to an operation that requires gradient, its output
will also require gradient. Conversely, only if all inputs don't require
@ -61,7 +61,7 @@ will also require them.
.. _how-autograd-encodes-history:
How autograd encodes the history
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--------------------------------
Autograd is reverse automatic differentiation system. Conceptually,
autograd records a graph recording all of the operations that created
@ -87,7 +87,7 @@ every iteration. You don't have to encode all possible paths before you
launch the training - what you run is what you differentiate.
In-place operations with autograd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
---------------------------------
Supporting in-place operations in autograd is a hard matter, and we discourage
their use in most cases. Autograd's aggressive buffer freeing and reuse makes
@ -121,7 +121,8 @@ functions and not seeing any errors, you can be sure that the computed
gradients are correct.
Multithreaded Autograd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------
The autograd engine is responsible for running all the backward operations
necessary to compute the backward pass. This section will describe all the details
that can help you make the best use of it in a multithreaded environment.(this is
@ -156,7 +157,7 @@ does not block on the concurrent backward computations, example code could be:
Note that some behaviors that user should be aware of:
Concurrency on CPU
------------------
^^^^^^^^^^^^^^^^^^
When you run ``backward()`` or ``grad()`` via python or C++ API in multiple
threads on CPU, you are expecting to see extra concurrency instead of
@ -164,7 +165,7 @@ serializing all the backward calls in a specific order during execution
(behavior before PyTorch 1.6).
Non-determinism
------------------
^^^^^^^^^^^^^^^
If you are calling ``backward()`` on multiple thread concurrently but with
shared inputs (i.e. Hogwild CPU training). Since parameters are automatically
@ -180,7 +181,7 @@ to happen. User could use the functional API :func:`torch.autograd.grad` to
calculate the gradients instead of ``backward()`` to avoid non-determinism.
Graph retaining
------------------
^^^^^^^^^^^^^^^
If part of the autograd graph is shared between threads, i.e. run first
part of forward single thread, then run second part in multiple threads,
@ -192,7 +193,7 @@ crash in this case. Autograd will error out to the user similar to what call
they should use ``retain_graph=True``.
Thread Safety on Autograd Node
------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Since Autograd allows the caller thread to drive its backward execution for
potential parallelism, it's important that we ensure thread safety on CPU with
@ -204,7 +205,7 @@ for built-in C++ Autograd Nodes(e.g. AccumulateGrad, CopySlices) and custom
thread safety on autograd Nodes that might have state write/read.
No thread safety on C++ hooks
------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Autograd relies on the user to write thread safe C++ hooks. If you want the hook
to be correctly applied in multithreading environment, you will need to write