mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
[docs] Update autograd notes (#6769)
This commit is contained in:
@ -11,7 +11,7 @@ programs, and can aid you in debugging.
|
||||
Excluding subgraphs from backward
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Every Variable has a flag: :attr:`requires_grad` that allows for fine grained
|
||||
Every Tensor has a flag: :attr:`requires_grad` that allows for fine grained
|
||||
exclusion of subgraphs from gradient computation and can increase efficiency.
|
||||
|
||||
.. _excluding-requires_grad:
|
||||
@ -22,13 +22,13 @@ exclusion of subgraphs from gradient computation and can increase efficiency.
|
||||
If there's a single input to an operation that requires gradient, its output
|
||||
will also require gradient. Conversely, only if all inputs don't require
|
||||
gradient, the output also won't require it. Backward computation is never
|
||||
performed in the subgraphs, where all Variables didn't require gradients.
|
||||
performed in the subgraphs, where all Tensors didn't require gradients.
|
||||
|
||||
.. code::
|
||||
|
||||
>>> x = Variable(torch.randn(5, 5))
|
||||
>>> y = Variable(torch.randn(5, 5))
|
||||
>>> z = Variable(torch.randn(5, 5), requires_grad=True)
|
||||
>>> x = torch.randn(5, 5) # requires_grad=False by default
|
||||
>>> y = torch.randn(5, 5) # requires_grad=False by default
|
||||
>>> z = torch.randn((5, 5), requires_grad=True)
|
||||
>>> a = x + y
|
||||
>>> a.requires_grad
|
||||
False
|
||||
@ -62,7 +62,7 @@ How autograd encodes the history
|
||||
Autograd is reverse automatic differentiation system. Conceptually,
|
||||
autograd records a graph recording all of the operations that created
|
||||
the data as you execute operations, giving you a directed acyclic graph
|
||||
whose leaves are the input variables and roots are the output variables.
|
||||
whose leaves are the input tensors and roots are the output tensors.
|
||||
By tracing this graph from roots to leaves, you can automatically
|
||||
compute the gradients using the chain rule.
|
||||
|
||||
@ -72,7 +72,7 @@ Internally, autograd represents this graph as a graph of
|
||||
evaluating the graph. When computing the forwards pass, autograd
|
||||
simultaneously performs the requested computations and builds up a graph
|
||||
representing the function that computes the gradient (the ``.grad_fn``
|
||||
attribute of each :class:`Variable` is an entry point into this graph).
|
||||
attribute of each :class:`torch.Tensor` is an entry point into this graph).
|
||||
When the forwards pass is completed, we evaluate this graph in the
|
||||
backwards pass to compute the gradients.
|
||||
|
||||
@ -82,8 +82,8 @@ flow statements, that can change the overall shape and size of the graph at
|
||||
every iteration. You don't have to encode all possible paths before you
|
||||
launch the training - what you run is what you differentiate.
|
||||
|
||||
In-place operations on Variables
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
In-place operations with autograd
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Supporting in-place operations in autograd is a hard matter, and we discourage
|
||||
their use in most cases. Autograd's aggressive buffer freeing and reuse makes
|
||||
@ -93,26 +93,25 @@ under heavy memory pressure, you might never need to use them.
|
||||
|
||||
There are two main reasons that limit the applicability of in-place operations:
|
||||
|
||||
1. Overwriting values required to compute gradients. This is why variables don't
|
||||
support ``log_``. Its gradient formula requires the original input, and while
|
||||
it is possible to recreate it by computing the inverse operation, it is
|
||||
numerically unstable, and requires additional work that often defeats the
|
||||
purpose of using these functions.
|
||||
1. In-place operations can potentially overwrite values required to compute
|
||||
gradients.
|
||||
|
||||
2. Every in-place operation actually requires the implementation to rewrite the
|
||||
computational graph. Out-of-place versions simply allocate new objects and
|
||||
keep references to the old graph, while in-place operations, require
|
||||
changing the creator of all inputs to the :class:`Function` representing
|
||||
this operation. This can be tricky, especially if there are many Variables
|
||||
this operation. This can be tricky, especially if there are many Tensors
|
||||
that reference the same storage (e.g. created by indexing or transposing),
|
||||
and in-place functions will actually raise an error if the storage of
|
||||
modified inputs is referenced by any other :class:`Variable`.
|
||||
modified inputs is referenced by any other :class:`Tensor`.
|
||||
|
||||
In-place correctness checks
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Every variable keeps a version counter, that is incremented every time it's
|
||||
Every tensor keeps a version counter, that is incremented every time it is
|
||||
marked dirty in any operation. When a Function saves any tensors for backward,
|
||||
a version counter of their containing Variable is saved as well. Once you access
|
||||
``self.saved_tensors`` it is checked, and if it's greater than the saved value
|
||||
an error is raised.
|
||||
a version counter of their containing Tensor is saved as well. Once you access
|
||||
``self.saved_tensors`` it is checked, and if it is greater than the saved value
|
||||
an error is raised. This ensures that if you're using in-place
|
||||
functions and not seeing any errors, you can be sure that the computed
|
||||
gradients are correct.
|
||||
|
Reference in New Issue
Block a user