[docs] Update autograd notes (#6769)

2025-10-20 21:14:14 +08:00 · 2018-04-19 13:34:14 -04:00
parent de9bdf1d31
commit 2acc247517
1 changed files with 19 additions and 20 deletions
--- a/docs/source/notes/autograd.rst
+++ b/docs/source/notes/autograd.rst
@ -11,7 +11,7 @@ programs, and can aid you in debugging.
 Excluding subgraphs from backward
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-Every Variable has a flag: :attr:`requires_grad` that allows for fine grained
+Every Tensor has a flag: :attr:`requires_grad` that allows for fine grained
 exclusion of subgraphs from gradient computation and can increase efficiency.

 .. _excluding-requires_grad:
@ -22,13 +22,13 @@ exclusion of subgraphs from gradient computation and can increase efficiency.
 If there's a single input to an operation that requires gradient, its output
 will also require gradient. Conversely, only if all inputs don't require
 gradient, the output also won't require it. Backward computation is never
-performed in the subgraphs, where all Variables didn't require gradients.
+performed in the subgraphs, where all Tensors didn't require gradients.

 .. code::

-    >>> x = Variable(torch.randn(5, 5))
-    >>> y = Variable(torch.randn(5, 5))
-    >>> z = Variable(torch.randn(5, 5), requires_grad=True)
+    >>> x = torch.randn(5, 5)  # requires_grad=False by default
+    >>> y = torch.randn(5, 5)  # requires_grad=False by default
+    >>> z = torch.randn((5, 5), requires_grad=True)
    >>> a = x + y
    >>> a.requires_grad
    False
@ -62,7 +62,7 @@ How autograd encodes the history
 Autograd is reverse automatic differentiation system.  Conceptually,
 autograd records a graph recording all of the operations that created
 the data as you execute operations, giving you a directed acyclic graph
-whose leaves are the input variables and roots are the output variables.
+whose leaves are the input tensors and roots are the output tensors.
 By tracing this graph from roots to leaves, you can automatically
 compute the gradients using the chain rule.

@ -72,7 +72,7 @@ Internally, autograd represents this graph as a graph of
 evaluating the graph.  When computing the forwards pass, autograd
 simultaneously performs the requested computations and builds up a graph
 representing the function that computes the gradient (the ``.grad_fn``
-attribute of each :class:`Variable` is an entry point into this graph).
+attribute of each :class:`torch.Tensor` is an entry point into this graph).
 When the forwards pass is completed, we evaluate this graph in the
 backwards pass to compute the gradients.

@ -82,8 +82,8 @@ flow statements, that can change the overall shape and size of the graph at
 every iteration. You don't have to encode all possible paths before you
 launch the training - what you run is what you differentiate.

-In-place operations on Variables
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+In-place operations with autograd
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Supporting in-place operations in autograd is a hard matter, and we discourage
 their use in most cases. Autograd's aggressive buffer freeing and reuse makes
@ -93,26 +93,25 @@ under heavy memory pressure, you might never need to use them.

 There are two main reasons that limit the applicability of in-place operations:

-1. Overwriting values required to compute gradients. This is why variables don't
-   support ``log_``. Its gradient formula requires the original input, and while
-   it is possible to recreate it by computing the inverse operation, it is
-   numerically unstable, and requires additional work that often defeats the
-   purpose of using these functions.
+1. In-place operations can potentially overwrite values required to compute
+   gradients.

 2. Every in-place operation actually requires the implementation to rewrite the
   computational graph. Out-of-place versions simply allocate new objects and
   keep references to the old graph, while in-place operations, require
   changing the creator of all inputs to the :class:`Function` representing
-   this operation. This can be tricky, especially if there are many Variables
+   this operation. This can be tricky, especially if there are many Tensors
   that reference the same storage (e.g. created by indexing or transposing),
   and in-place functions will actually raise an error if the storage of
-   modified inputs is referenced by any other :class:`Variable`.
+   modified inputs is referenced by any other :class:`Tensor`.

 In-place correctness checks
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^

-Every variable keeps a version counter, that is incremented every time it's
+Every tensor keeps a version counter, that is incremented every time it is
 marked dirty in any operation. When a Function saves any tensors for backward,
-a version counter of their containing Variable is saved as well. Once you access
-``self.saved_tensors`` it is checked, and if it's greater than the saved value
-an error is raised.
+a version counter of their containing Tensor is saved as well. Once you access
+``self.saved_tensors`` it is checked, and if it is greater than the saved value
+an error is raised. This ensures that if you're using in-place
+functions and not seeing any errors, you can be sure that the computed
+gradients are correct.