mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Add randomness case to the autograd notes
I also took this chance to clean a bit the sphinx formatting and reworded a few minor things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78617 Approved by: https://github.com/soulitzer, https://github.com/albanD
This commit is contained in:
committed by
PyTorch MergeBot
parent
7cb4a76844
commit
a8ea58afee
@ -93,15 +93,17 @@ Gradients for non-differentiable functions
|
||||
------------------------------------------
|
||||
|
||||
The gradient computation using Automatic Differentiation is only valid when each elementary function being used is differentiable.
|
||||
Unfortunately many of the function we use in practice do not have this property (relu or sqrt at 0 for example).
|
||||
And even though we cannot always guarantee that the returned gradient will be correct. For example :math:`f(x) = x = \text{relu}(x) - \text{relu}(-x)` will give a 0 gradient at 0 instead of 1 for any value we choose for the gradient of relu at 0.
|
||||
To try and reduce the impact of this limitation, we define the gradients of the elementary operations by applying the following rules in order:
|
||||
Unfortunately many of the functions we use in practice do not have this property (``relu`` or ``sqrt`` at ``0``, for example).
|
||||
To try and reduce the impact of functions that are non-differentiable, we define the gradients of the elementary operations by applying the following rules in order:
|
||||
|
||||
#. If the function is differentiable and thus a gradient exists at the current point, use it.
|
||||
#. If the function is convex (at least locally), use the sub-gradient with minimum norm (as it the steepest descent direction, see Exercise 2.7 from "Convex Optimization Algorithms" by Bertsekas, D. P and "Steepest Descent for Optimization Problems with Nondifferentiable Cost Functionals" by Bertsekas, D. P, and Mitter, S. K., 1971. for details and proofs).
|
||||
#. If the function is concave (at least locally), use the super-gradient with minimum norm (using a similar argument as above).
|
||||
#. If the function is defined, define the gradient at the current point by continuity (note that :math:`inf` is possible here, for example, :math:`sqrt(0)`). If multiple values are possible, pick one arbitrarily.
|
||||
#. If the function is not defined (:math:`\sqrt(-1)`, :math:`\log(-1)` or most functions when the input is :math:`nan` for example) then the value used as the gradient is arbitrary (we might also raise an error but that is not guaranteed). Most functions will use :math:`nan` as the gradient, but for performance reasons, some functions will use non-:math:`nan` values (:math:`\log(-1)` for example).
|
||||
#. If the function is convex (at least locally), use the sub-gradient of minimum norm (it is the steepest descent direction).
|
||||
#. If the function is concave (at least locally), use the super-gradient of minimum norm (consider `-f(x)` and apply the previous point).
|
||||
#. If the function is defined, define the gradient at the current point by continuity (note that ``inf`` is possible here, for example for ``sqrt(0)``). If multiple values are possible, pick one arbitrarily.
|
||||
#. If the function is not defined (``sqrt(-1)``, ``log(-1)`` or most functions when the input is ``NaN``, for example) then the value used as the gradient is arbitrary (we might also raise an error but that is not guaranteed). Most functions will use ``NaN`` as the gradient, but for performance reasons, some functions will use other values (``log(-1)``, for example).
|
||||
#. If the function is not a deterministic mapping (i.e. it is not a `mathematical function`_), it will be marked as non-differentiable. This will make it error out in the backward if used on tensors that require grad outside of a ``no_grad`` environment.
|
||||
|
||||
.. _mathematical function: https://en.wikipedia.org/wiki/Function_(mathematics)
|
||||
|
||||
.. _locally-disable-grad-doc:
|
||||
|
||||
|
Reference in New Issue
Block a user