Split up documentation into subpages and clean up some warnings (#37419)

Summary:
xref gh-32838, gh-34032

This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages.

Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py`

I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419

Differential Revision: D21337640

Pulled By: ezyang

fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f
This commit is contained in:
mattip
2020-05-04 09:35:17 -07:00
committed by Facebook GitHub Bot
parent b1e4e4d470
commit f10fbcc820
51 changed files with 1694 additions and 1980 deletions

1
.gitignore vendored
View File

@ -30,6 +30,7 @@ dist/
docs/src/**/*
docs/cpp/build
docs/cpp/source/api
docs/source/generated/
log
test/.coverage
test/.hypothesis/

View File

@ -0,0 +1,12 @@
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}
{{ name | underline}}
.. autoclass:: {{ name }}
:inherited-members:
:members:
.. autogenerated from source/_templates/autosummary/class.rst

View File

@ -0,0 +1,14 @@
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}
{{ name | underline}}
.. autoclass:: {{ name }}
:members:
..
autogenerated from source/_templates/classtemplate.rst
note it does not have :inherited-members:

View File

@ -0,0 +1,14 @@
.. currentmodule:: {{ module }}
{{ name | underline}}
.. autoclass:: {{ name }}
:members:
:exclude-members: MAXBIT, MAXDIM
:undoc-members:
..
autogenerated from source/_templates/sobolengine.rst
note it has specific options

View File

@ -145,7 +145,7 @@ PowerPC
- Alfredo Mendoza (`avmgithub <https://github.com/avmgithub>`__)
Library-level maintainers
------------------------
-------------------------
XLA
~~~

View File

@ -23,6 +23,7 @@ import os
# sys.path.insert(0, os.path.abspath('../..'))
import torch
try:
import torchvision # noqa: F401
except ImportError:
@ -56,6 +57,10 @@ extensions = [
'javasphinx',
]
# build the templated autosummary files
autosummary_generate = True
numpydoc_show_class_members = False
# autosectionlabel throws warnings if section names are duplicated.
# The following tells autosectionlabel to not throw a warning for
# duplicated section names that are in different documents.
@ -72,7 +77,7 @@ napoleon_use_ivar = True
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
if RELEASE:
templates_path = ['_templates-stable']
templates_path = ['_templates-stable'] + templates_path
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
@ -240,8 +245,8 @@ texinfo_documents = [
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
'python': ('https://docs.python.org/', None),
'numpy': ('https://docs.scipy.org/doc/numpy/', None),
'python': ('https://docs.python.org/3', None),
'numpy': ('https://numpy.org/doc/stable', None),
}
# -- A patch that prevents Sphinx from cross-referencing ivar tags -------

View File

@ -1,5 +0,0 @@
.. note::
When using the CUDA backend, this operation may induce nondeterministic
behaviour that is not easily switched off.
Please see the notes on :doc:`/notes/randomness` for background.

View File

@ -1,5 +0,0 @@
.. note::
When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off.
Please see the notes on :doc:`/notes/randomness` for background.

View File

@ -1,8 +0,0 @@
.. note::
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.

View File

@ -1,3 +1,6 @@
:orphan:
.. note::
If the following conditions are satisfied:

View File

@ -78,7 +78,7 @@ Probability distributions - torch.distributions
:show-inheritance:
:hidden:`ContinuousBernoulli`
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torch.distributions.continuous_bernoulli
.. autoclass:: ContinuousBernoulli

View File

@ -0,0 +1,2 @@
[html writers]
table_style: colwidths-auto

View File

@ -45,7 +45,7 @@ PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
onnx
optim
quantization
rpc/index.rst
rpc
torch.random <random>
sparse
storage

View File

@ -10,11 +10,11 @@ TorchScript
.. toctree::
:maxdepth: 1
:caption: Language Reference
:hidden:
:maxdepth: 1
:caption: Language Reference
:hidden:
language_reference <jit_language_reference>
jit_language_reference
.. contents:: :local:
:depth: 2
@ -40,25 +40,18 @@ For an end-to-end example of converting a PyTorch model to TorchScript and runni
Creating TorchScript Code
--------------------------
.. autofunction:: script(obj)
.. autofunction:: trace(func, example_inputs, optimize=None, check_trace=True, check_inputs=None, check_tolerance=1e-5)
.. autofunction:: trace_module(mod, inputs, optimize=None, check_trace=True, check_inputs=None, check_tolerance=1e-5)
.. autoclass:: ScriptModule()
:members:
.. autoclass:: ScriptFunction()
.. autofunction:: save
.. autofunction:: load
.. autofunction:: ignore
.. autofunction:: unused
.. autosummary::
:toctree: generated
script
trace
trace_module
ScriptModule
ScriptFunction
save
load
ignore
unused
Mixing Tracing and Scripting
----------------------------
@ -167,7 +160,7 @@ TorchScript is a statically typed subset of Python, so many Python features appl
directly to TorchScript. See the full :ref:`language-reference` for details.
.. _Builtin functions:
.. _builtin functions:
Built-in Functions and Modules
------------------------------
@ -207,39 +200,38 @@ Disable JIT for Debugging
~~~~~~~~~~~~~~~~~~~~~~~~~
.. envvar:: PYTORCH_JIT
Setting the environment variable ``PYTORCH_JIT=0`` will disable all script
and tracing annotations. If there is hard-to-debug error in one of your
TorchScript model, you can use this flag to force everything to run using native
Python. Since TorchScript (scripting and tracing) are disabled with this flag,
you can use tools like ``pdb`` to debug the model code.
Setting the environment variable ``PYTORCH_JIT=0`` will disable all script
and tracing annotations. If there is hard-to-debug error in one of your
TorchScript model, you can use this flag to force everything to run using native
Python. Since TorchScript (scripting and tracing) are disabled with this flag,
you can use tools like ``pdb`` to debug the model code. For example::
Given an example
@torch.jit.script
def scripted_fn(x : torch.Tensor):
for i in range(12):
x = x + x
return x
@torch.jit.script
def scripted_fn(x : torch.Tensor):
for i in range(12):
x = x + x
return x
def fn(x):
x = torch.neg(x)
import pdb; pdb.set_trace()
return scripted_fn(x)
traced_fn = torch.jit.trace(fn, (torch.rand(4, 5),))
traced_fn(torch.rand(3, 4))
def fn(x):
x = torch.neg(x)
import pdb; pdb.set_trace()
return scripted_fn(x)
Debugging this script with ``pdb`` works except for when we invoke the
:func:`@torch.jit.script <torch.jit.script>` function. We can globally disable
JIT, so that we can call the :func:`@torch.jit.script <torch.jit.script>`
function as a normal Python function and not compile it. If the above script
is called ``disable_jit_example.py``, we can invoke it like so::
traced_fn = torch.jit.trace(fn, (torch.rand(4, 5),))
traced_fn(torch.rand(3, 4))
$ PYTORCH_JIT=0 python disable_jit_example.py
Debugging this script with ``pdb`` works except for when we invoke the :func:`@torch.jit.script <torch.jit.script>`
function. We can globally disable JIT, so that we can call the :func:`@torch.jit.script <torch.jit.script>`
function as a normal Python function and not compile it. If the above script
is called ``disable_jit_example.py``, we can invoke it like so::
$ PYTORCH_JIT=0 python disable_jit_example.py
and we will be able to step into the :func:`@torch.jit.script <torch.jit.script>` function as a normal Python
function. To disable the TorchScript compiler for a specific function, see
:func:`@torch.jit.ignore <torch.jit.ignore>`.
and we will be able to step into the :func:`@torch.jit.script
<torch.jit.script>` function as a normal Python function. To disable the
TorchScript compiler for a specific function, see
:func:`@torch.jit.ignore <torch.jit.ignore>`.
Inspecting Code
@ -537,14 +529,6 @@ rather build up the result tensor out-of-place with ``torch.cat``:
...
.. _Builtin functions:
Built-in Functions and Modules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See :ref:`builtin-functions` for a full reference of supported functions.
Frequently Asked Questions
--------------------------
@ -608,13 +592,11 @@ Q: How do I store attributes on a :class:`ScriptModule`?
3. Constants - Annotating a class member as ``Final`` (or adding it to a list called
``__constants__`` at the class definition level) will mark the contained names
as constants. Constants are saved directly in the code of the model. See
`Python-defined Constants`_ for details.
`builtin-constants` for details.
4. Attributes - Values that are a `supported type`_ can be added as mutable
4. Attributes - Values that are a `supported type` can be added as mutable
attributes. Most types can be inferred but some may need to be specified, see
`Module Attributes`_ for details.
`module attributes` for details.
Q: I would like to trace module's method but I keep getting this error:
@ -741,12 +723,13 @@ TorchScript Classes
for simple record-like types (think a ``NamedTuple`` with methods
attached).
Everything in a user defined `TorchScript Class`_ is exported by default, functions
can be decorated with :func:`@torch.jit.ignore <torch.jit.ignore>` if needed.
Everything in a user defined `TorchScript Class <torchscript-class>`_ is
exported by default, functions can be decorated with :func:`@torch.jit.ignore
<torch.jit.ignore>` if needed.
Attributes
^^^^^^^^^^
The TorchScript compiler needs to know the types of `module attributes`_. Most types
The TorchScript compiler needs to know the types of `module attributes`. Most types
can be inferred from the value of the member. Empty lists and dicts cannot have their
types inferred and must have their types annotated with `PEP 526-style <https://www.python.org/dev/peps/pep-0526/#class-and-instance-variable-annotations>`_ class annotations.
If a type cannot be inferred and is not explicitly annotated, it will not be added as an attribute
@ -793,7 +776,7 @@ New API:
Constants
^^^^^^^^^
The ``Final`` type constructor can be used to mark members as `constant`_. If members are not marked constant, they will be copied to the resulting :class:`ScriptModule` as an attribute. Using ``Final`` opens opportunities for optimization if the value is known to be fixed and gives additional type safety.
The ``Final`` type constructor can be used to mark members as `constant`. If members are not marked constant, they will be copied to the resulting :class:`ScriptModule` as an attribute. Using ``Final`` opens opportunities for optimization if the value is known to be fixed and gives additional type safety.
Old API:
@ -839,7 +822,7 @@ New API:
Variables
^^^^^^^^^
Containers are assumed to have type ``Tensor`` and be non-optional (see
`Default Types`_ for more information). Previously, ``torch.jit.annotate`` was used to
`Default Types` for more information). Previously, ``torch.jit.annotate`` was used to
tell the TorchScript compiler what the type should be. Python 3 style type hints are
now supported.
@ -856,3 +839,11 @@ now supported.
if flag:
b = 2
return x, b
References
~~~~~~~~~~
.. toctree::
:maxdepth: 1
jit_python_reference
jit_unsupported

View File

@ -39,11 +39,11 @@ When writing TorchScript directly using ``@torch.jit.script`` decorator, the pro
only use the subset of Python supported in TorchScript. This section documents
what is supported in TorchScript as if it were a language reference for a stand
alone language. Any features of Python not mentioned in this reference are not
part of TorchScript. See `Builtin Functions`_ for a complete reference of available
part of TorchScript. See `Builtin Functions` for a complete reference of available
Pytorch tensor methods, modules, and functions.
As a subset of Python, any valid TorchScript function is also a valid Python
function. This makes it possible to `disable TorchScript`_ and debug the
function. This makes it possible to `disable TorchScript` and debug the
function using standard Python tools like ``pdb``. The reverse is not true: there
are many valid Python programs that are not valid TorchScript programs.
Instead, TorchScript focuses specifically on the features of Python that are
@ -188,7 +188,7 @@ MyPy-style type annotations using the types listed above.
An empty list is assumed to be ``List[Tensor]`` and empty dicts
``Dict[str, Tensor]``. To instantiate an empty list or dict of other types,
use `Python 3 type hints`_.
use `Python 3 type hints`.
Example (type annotations for Python 3):
@ -487,7 +487,7 @@ Subscripts and Slicing
Function Calls
^^^^^^^^^^^^^^
Calls to `builtin functions`_
Calls to `builtin functions`
::
@ -768,12 +768,12 @@ to TorchScript, leaving calls to Python functions in place. This way you can inc
check the correctness of the model as you go.
.. autofunction:: is_scripting
.. autofunction:: torch.jit.is_scripting
Attribute Lookup On Python Modules
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TorchScript can lookup attributes on modules. `Builtin functions`_ like ``torch.add``
TorchScript can lookup attributes on modules. `Builtin functions` like ``torch.add``
are accessed this way. This allows TorchScript to call functions defined in
other modules.

File diff suppressed because it is too large Load Diff

View File

@ -83,7 +83,7 @@ this way (and remember that you will need at least twice the size of the
weights, since you also need to store the gradients.)
My GPU memory isn't freed properly
-------------------------------------------------------
----------------------------------
PyTorch uses a caching memory allocator to speed up memory allocations. As a
result, the values shown in ``nvidia-smi`` usually don't reflect the true
memory usage. See :ref:`cuda-memory-management` for more details about GPU
@ -94,7 +94,7 @@ some Python subprocesses are still alive. You may find them via
``ps -elf | grep python`` and manually kill them with ``kill -9 [pid]``.
My out of memory exception handler can't allocate memory
-------------------------------------------------------
--------------------------------------------------------
You may have some code that tries to recover from out of memory errors.
.. code-block:: python

View File

@ -731,5 +731,5 @@ Functions
.. autofunction:: export
.. autofunction:: register_custom_op_symbolic
.. autofunction:: torch.onnx.operators.shape_as_tensor
.. autofunction:: set_training
.. autofunction:: select_model_mode_for_export
.. autofunction:: is_in_onnx_export

View File

@ -38,40 +38,40 @@ dictStringKeyFrom
Creates a new \ ``IValue``\ of type \ ``Dict[str, V]``\ .
from
^^^^
from(Tensor)
^^^^^^^^^^^^
.. java:method:: public static IValue from(Tensor tensor)
:outertype: IValue
Creates a new \ ``IValue``\ of type \ ``Tensor``\ .
from
^^^^
from(boolean)
^^^^^^^^^^^^^
.. java:method:: public static IValue from(boolean value)
:outertype: IValue
Creates a new \ ``IValue``\ of type \ ``bool``\ .
from
^^^^
from(long)
^^^^^^^^^^
.. java:method:: public static IValue from(long value)
:outertype: IValue
Creates a new \ ``IValue``\ of type \ ``int``\ .
from
^^^^
from(double)
^^^^^^^^^^^^
.. java:method:: public static IValue from(double value)
:outertype: IValue
Creates a new \ ``IValue``\ of type \ ``float``\ .
from
^^^^
from(String)
^^^^^^^^^^^^
.. java:method:: public static IValue from(String value)
:outertype: IValue
@ -162,40 +162,40 @@ isTuple
.. java:method:: public boolean isTuple()
:outertype: IValue
listFrom
^^^^^^^^
listFrom(boolean...)
^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static IValue listFrom(boolean... list)
:outertype: IValue
Creates a new \ ``IValue``\ of type \ ``List[bool]``\ .
listFrom
^^^^^^^^
listFrom(long...)
^^^^^^^^^^^^^^^^^
.. java:method:: public static IValue listFrom(long... list)
:outertype: IValue
Creates a new \ ``IValue``\ of type \ ``List[int]``\ .
listFrom
^^^^^^^^
listFrom(double...)
^^^^^^^^^^^^^^^^^^^
.. java:method:: public static IValue listFrom(double... list)
:outertype: IValue
Creates a new \ ``IValue``\ of type \ ``List[float]``\ .
listFrom
^^^^^^^^
listFrom(Tensor...)
^^^^^^^^^^^^^^^^^^^
.. java:method:: public static IValue listFrom(Tensor... list)
:outertype: IValue
Creates a new \ ``IValue``\ of type \ ``List[Tensor]``\ .
listFrom
^^^^^^^^
listFrom(IValue...)
^^^^^^^^^^^^^^^^^^^
.. java:method:: public static IValue listFrom(IValue... array)
:outertype: IValue

View File

@ -98,8 +98,8 @@ dtypeJniCode
.. java:method:: int dtypeJniCode()
:outertype: Tensor
fromBlob
^^^^^^^^
fromBlob(byte[], long[])
^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlob(byte[] data, long[] shape)
:outertype: Tensor
@ -109,8 +109,8 @@ fromBlob
:param data: Tensor elements
:param shape: Tensor shape
fromBlob
^^^^^^^^
fromBlob(int[], long[])
^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlob(int[] data, long[] shape)
:outertype: Tensor
@ -120,8 +120,8 @@ fromBlob
:param data: Tensor elements
:param shape: Tensor shape
fromBlob
^^^^^^^^
fromBlob(float[], long[])
^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlob(float[] data, long[] shape)
:outertype: Tensor
@ -131,8 +131,8 @@ fromBlob
:param data: Tensor elements
:param shape: Tensor shape
fromBlob
^^^^^^^^
fromBlob(long[], long[])
^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlob(long[] data, long[] shape)
:outertype: Tensor
@ -142,8 +142,8 @@ fromBlob
:param data: Tensor elements
:param shape: Tensor shape
fromBlob
^^^^^^^^
fromBlob(long[], double[])
^^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlob(long[] shape, double[] data)
:outertype: Tensor
@ -153,8 +153,8 @@ fromBlob
:param shape: Tensor shape
:param data: Tensor elements
fromBlob
^^^^^^^^
fromBlob(ByteBuffer, long[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlob(ByteBuffer data, long[] shape)
:outertype: Tensor
@ -164,8 +164,8 @@ fromBlob
:param data: Direct buffer with native byte order that contains \ ``Tensor.numel(shape)``\ elements. The buffer is used directly without copying, and changes to its content will change the tensor.
:param shape: Tensor shape
fromBlob
^^^^^^^^
fromBlob(IntBuffer, long[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlob(IntBuffer data, long[] shape)
:outertype: Tensor
@ -175,8 +175,8 @@ fromBlob
:param data: Direct buffer with native byte order that contains \ ``Tensor.numel(shape)``\ elements. The buffer is used directly without copying, and changes to its content will change the tensor.
:param shape: Tensor shape
fromBlob
^^^^^^^^
fromBlob(FloatBuffer, long[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlob(FloatBuffer data, long[] shape)
:outertype: Tensor
@ -186,8 +186,8 @@ fromBlob
:param data: Direct buffer with native byte order that contains \ ``Tensor.numel(shape)``\ elements. The buffer is used directly without copying, and changes to its content will change the tensor.
:param shape: Tensor shape
fromBlob
^^^^^^^^
fromBlob(LongBuffer, long[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlob(LongBuffer data, long[] shape)
:outertype: Tensor
@ -197,8 +197,8 @@ fromBlob
:param data: Direct buffer with native byte order that contains \ ``Tensor.numel(shape)``\ elements. The buffer is used directly without copying, and changes to its content will change the tensor.
:param shape: Tensor shape
fromBlob
^^^^^^^^
fromBlob(DoubleBuffer, long[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlob(DoubleBuffer data, long[] shape)
:outertype: Tensor
@ -208,8 +208,8 @@ fromBlob
:param data: Direct buffer with native byte order that contains \ ``Tensor.numel(shape)``\ elements. The buffer is used directly without copying, and changes to its content will change the tensor.
:param shape: Tensor shape
fromBlobUnsigned
^^^^^^^^^^^^^^^^
fromBlobUnsigned(byte[], long[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlobUnsigned(byte[] data, long[] shape)
:outertype: Tensor
@ -219,8 +219,8 @@ fromBlobUnsigned
:param data: Tensor elements
:param shape: Tensor shape
fromBlobUnsigned
^^^^^^^^^^^^^^^^
fromBlobUnsigned(ByteBuffer, long[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor fromBlobUnsigned(ByteBuffer data, long[] shape)
:outertype: Tensor
@ -290,16 +290,16 @@ getRawDataBuffer
.. java:method:: Buffer getRawDataBuffer()
:outertype: Tensor
numel
^^^^^
numel()
^^^^^^^
.. java:method:: public long numel()
:outertype: Tensor
Returns the number of elements in this tensor.
numel
^^^^^
numel(long[])
^^^^^^^^^^^^^
.. java:method:: public static long numel(long[] shape)
:outertype: Tensor
@ -313,3 +313,6 @@ shape
:outertype: Tensor
Returns the shape of this tensor. (The array is a fresh copy.)
.. toctree::
TensorImageUtils

View File

@ -38,8 +38,8 @@ TORCHVISION_NORM_STD_RGB
Methods
-------
bitmapToFloat32Tensor
^^^^^^^^^^^^^^^^^^^^^
bitmapToFloat32Tensor(Bitmap, float[], float[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor bitmapToFloat32Tensor(Bitmap bitmap, float[] normMeanRGB, float[] normStdRGB)
:outertype: TensorImageUtils
@ -49,8 +49,8 @@ bitmapToFloat32Tensor
:param normMeanRGB: means for RGB channels normalization, length must equal 3, RGB order
:param normStdRGB: standard deviation for RGB channels normalization, length must equal 3, RGB order
bitmapToFloat32Tensor
^^^^^^^^^^^^^^^^^^^^^
bitmapToFloat32Tensor(Bitmap, int, int, int, int, float[], float[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. java:method:: public static Tensor bitmapToFloat32Tensor(Bitmap bitmap, int x, int y, int width, int height, float[] normMeanRGB, float[] normStdRGB)
:outertype: TensorImageUtils

View File

@ -1,9 +0,0 @@
rg.pytorch.torchvision
=======================
.. java:package:: org.pytorch.torchvision
.. toctree::
:maxdepth: 1
TensorImageUtils

View File

@ -1,67 +1,78 @@
.. _quantization-doc:
Quantization
===========================
============
Introduction to Quantization
----------------------------
Quantization refers to techniques for performing computations and storing tensors at lower bitwidths than
floating point precision. A quantized model executes some or all of the operations on tensors with
integers rather than floating point values. This allows for a more
compact model representation and the use of high performance vectorized
operations on many hardware platforms. PyTorch supports INT8
quantization compared to typical FP32 models allowing for a 4x reduction in the model size and
a 4x reduction in memory bandwidth requirements. Hardware support for INT8 computations
is typically 2 to 4 times faster compared to FP32 compute. Quantization is primarily a technique
to speed up inference and only the forward pass is supported for quantized operators.
Quantization refers to techniques for performing computations and storing
tensors at lower bitwidths than floating point precision. A quantized model
executes some or all of the operations on tensors with integers rather than
floating point values. This allows for a more compact model representation and
the use of high performance vectorized operations on many hardware platforms.
PyTorch supports INT8 quantization compared to typical FP32 models allowing for
a 4x reduction in the model size and a 4x reduction in memory bandwidth
requirements. Hardware support for INT8 computations is typically 2 to 4
times faster compared to FP32 compute. Quantization is primarily a technique to
speed up inference and only the forward pass is supported for quantized
operators.
PyTorch supports multiple approaches to quantizing a deep learning model. In most cases the model is trained
in FP32 and then the model is converted to INT8. In addition, PyTorch also supports quantization aware
training, which models quantization errors in both the forward and backward passes using fake-quantization
modules. Note that the entire computation is carried out in floating point. At the end of quantization aware
training, PyTorch provides conversion functions to convert the trained model into lower precision.
PyTorch supports multiple approaches to quantizing a deep learning model. In
most cases the model is trained in FP32 and then the model is converted to
INT8. In addition, PyTorch also supports quantization aware training, which
models quantization errors in both the forward and backward passes using
fake-quantization modules. Note that the entire computation is carried out in
floating point. At the end of quantization aware training, PyTorch provides
conversion functions to convert the trained model into lower precision.
At lower level, PyTorch provides a way to represent quantized tensors and
perform operations with them. They can be used to directly construct models that
perform all or part of the computation in lower precision. Higher-level APIs are
provided that incorporate typical workflows of converting FP32 model to lower
precision with minimal accuracy loss.
perform operations with them. They can be used to directly construct models
that perform all or part of the computation in lower precision. Higher-level
APIs are provided that incorporate typical workflows of converting FP32 model
to lower precision with minimal accuracy loss.
Today, PyTorch supports the following backends for running quantized operators efficiently:
* x86 CPUs with AVX2 support or higher (without AVX2 some operations have inefficient implementations)
* x86 CPUs with AVX2 support or higher (without AVX2 some operations have
inefficient implementations)
* ARM CPUs (typically found in mobile/embedded devices)
The corresponding implementation is chosen automatically based on the PyTorch build mode.
.. note::
PyTorch 1.3 doesn't provide quantized operator implementations on CUDA yet - this is direction of future work.
Move the model to CPU in order to test the quantized functionality.
PyTorch 1.3 doesn't provide quantized operator implementations on CUDA yet -
this is direction of future work. Move the model to CPU in order to test the
quantized functionality.
Quantization-aware training (through :class:`~torch.quantization.FakeQuantize`) supports both CPU and CUDA.
Quantization-aware training (through :class:`~torch.quantization.FakeQuantize`)
supports both CPU and CUDA.
.. note::
When preparing a quantized model, it is necessary to ensure that qconfig and the engine used for quantized computations match
the backend on which the model will be executed. Quantization currently supports two backends: fbgemm (for use on x86,
`<https://github.com/pytorch/FBGEMM>`_) and qnnpack (for use on the ARM QNNPACK library `<https://github.com/pytorch/QNNPACK>`_).
For example, if you are interested in quantizing a model to run on ARM, it is recommended to set the qconfig by calling:
When preparing a quantized model, it is necessary to ensure that qconfig
and the engine used for quantized computations match the backend on which
the model will be executed. Quantization currently supports two backends:
fbgemm (for use on x86, `<https://github.com/pytorch/FBGEMM>`_) and qnnpack
(for use on the ARM QNNPACK library `<https://github.com/pytorch/QNNPACK>`_).
For example, if you are interested in quantizing a model to run on ARM, it
is recommended to set the qconfig by calling:
``qconfig = torch.quantization.get_default_qconfig('qnnpack')``
``qconfig = torch.quantization.get_default_qconfig('qnnpack')``
for post training quantization and
for post training quantization and
``qconfig = torch.quantization.get_default_qat_qconfig('qnnpack')``
``qconfig = torch.quantization.get_default_qat_qconfig('qnnpack')``
for quantization aware training.
for quantization aware training.
In addition, the torch.backends.quantized.engine parameter should be set to match the backend. For using qnnpack for inference, the
backend is set to qnnpack as follows
In addition, the torch.backends.quantized.engine parameter should be set to
match the backend. For using qnnpack for inference, the backend is set to
qnnpack as follows
``torch.backends.quantized.engine = 'qnnpack'``
``torch.backends.quantized.engine = 'qnnpack'``
Quantized Tensors
---------------------------------------
@ -79,8 +90,9 @@ The mapping is performed by converting the floating point tensors using
.. image:: math-quantizer-equation.png
:width: 40%
Note that, we ensure that zero in floating point is represented with no error after quantization,
thereby ensuring that operations like padding do not cause additional quantization error.
Note that, we ensure that zero in floating point is represented with no error
after quantization, thereby ensuring that operations like padding do not cause
additional quantization error.
In order to do quantization in PyTorch, we need to be able to represent
quantized data in Tensors. A Quantized Tensor allows for storing
@ -92,8 +104,8 @@ allowing for serialization of data in a quantized format.
Operation coverage
------------------
Quantized Tensors support a limited subset of data manipulation methods of the regular
full-precision tensor. (see list below)
Quantized Tensors support a limited subset of data manipulation methods of the
regular full-precision tensor. (see list below)
For NN operators included in PyTorch, we restrict support to:
@ -115,11 +127,11 @@ perform re-quantization are available in ``torch.nn.quantized``. Those
operations explicitly take output quantization parameters (scale and zero\_point) in
the operation signature.
In addition, we also support fused versions corresponding to common fusion patterns that impact quantization at:
torch.nn.intrinsic.quantized.
In addition, we also support fused versions corresponding to common fusion
patterns that impact quantization at: `torch.nn.intrinsic.quantized`.
For quantization aware training, we support modules prepared for quantization aware training at
torch.nn.qat and torch.nn.intrinsic.qat
For quantization aware training, we support modules prepared for quantization
aware training at `torch.nn.qat` and `torch.nn.intrinsic.qat`
Current quantized operation list is sufficient to cover typical CNN and RNN
models:
@ -128,11 +140,18 @@ models:
Quantized ``torch.Tensor`` operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Operations that are available from the ``torch`` namespace or as methods on Tensor for quantized tensors:
Operations that are available from the ``torch`` namespace or as methods on
Tensor for quantized tensors:
* :func:`~torch.quantize_per_tensor` - Convert float tensor to quantized tensor with per-tensor scale and zero point
* :func:`~torch.quantize_per_channel` - Convert float tensor to quantized tensor with per-channel scale and zero point
* View-based operations like :meth:`~torch.Tensor.view`, :meth:`~torch.Tensor.as_strided`, :meth:`~torch.Tensor.expand`, :meth:`~torch.Tensor.flatten`, :meth:`~torch.Tensor.select`, python-style indexing, etc - work as on regular tensor (if quantization is not per-channel)
* :func:`~torch.quantize_per_tensor` - Convert float tensor to quantized tensor
with per-tensor scale and zero point
* :func:`~torch.quantize_per_channel` - Convert float tensor to quantized
tensor with per-channel scale and zero point
* View-based operations like :meth:`~torch.Tensor.view`,
:meth:`~torch.Tensor.as_strided`, :meth:`~torch.Tensor.expand`,
:meth:`~torch.Tensor.flatten`, :meth:`~torch.Tensor.select`, python-style
indexing, etc - work as on regular tensor (if quantization is not
per-channel)
* Comparators
* :meth:`~torch.Tensor.ne` — Not equal
* :meth:`~torch.Tensor.eq` — Equal
@ -143,22 +162,28 @@ Operations that are available from the ``torch`` namespace or as methods on Tens
* :meth:`~torch.Tensor.copy_` — Copies src to self in-place
* :meth:`~torch.Tensor.clone` — Returns a deep copy of the passed-in tensor
* :meth:`~torch.Tensor.dequantize` — Convert quantized tensor to float tensor
* :meth:`~torch.Tensor.equal` — Compares two tensors, returns true if quantization parameters and all integer elements are the same
* :meth:`~torch.Tensor.int_repr` — Prints the underlying integer representation of the quantized tensor
* :meth:`~torch.Tensor.equal` — Compares two tensors, returns true if
quantization parameters and all integer elements are the same
* :meth:`~torch.Tensor.int_repr` — Prints the underlying integer representation
of the quantized tensor
* :meth:`~torch.Tensor.max` — Returns the maximum value of the tensor (reduction only)
* :meth:`~torch.Tensor.mean` — Mean function. Supported variants: reduction, dim, out
* :meth:`~torch.Tensor.min` — Returns the minimum value of the tensor (reduction only)
* :meth:`~torch.Tensor.q_scale` — Returns the scale of the per-tensor quantized tensor
* :meth:`~torch.Tensor.q_zero_point` — Returns the zero_point of the per-tensor quantized zero point
* :meth:`~torch.Tensor.q_per_channel_scales` — Returns the scales of the per-channel quantized tensor
* :meth:`~torch.Tensor.q_per_channel_zero_points` — Returns the zero points of the per-channel quantized tensor
* :meth:`~torch.Tensor.q_per_channel_axis` — Returns the channel axis of the per-channel quantized tensor
* :meth:`~torch.Tensor.q_zero_point` — Returns the zero_point of the per-tensor
quantized zero point
* :meth:`~torch.Tensor.q_per_channel_scales` — Returns the scales of the
per-channel quantized tensor
* :meth:`~torch.Tensor.q_per_channel_zero_points` — Returns the zero points of
the per-channel quantized tensor
* :meth:`~torch.Tensor.q_per_channel_axis` — Returns the channel axis of the
per-channel quantized tensor
* :meth:`~torch.Tensor.resize_` — In-place resize
* :meth:`~torch.Tensor.sort` — Sorts the tensor
* :meth:`~torch.Tensor.topk` — Returns k largest values of a tensor
``torch.nn.functional``
~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
Basic activations are supported.
@ -175,23 +200,31 @@ Basic activations are supported.
``torch.nn.intrinsic``
~~~~~~~~~~~~~~~~~~~~~~
Fused modules are provided for common patterns in CNNs. Combining several operations together (like convolution and relu) allows for better quantization accuracy
Fused modules are provided for common patterns in CNNs. Combining several
operations together (like convolution and relu) allows for better quantization
accuracy
* ``torch.nn.intrinsic`` — float versions of the modules, can be swapped with quantized version 1 to 1
* :class:`~torch.nn.intrinsic.ConvBn2d` — Conv2d + BatchNorm
* :class:`~torch.nn.intrinsic.ConvBnReLU2d` — Conv2d + BatchNorm + ReLU
* :class:`~torch.nn.intrinsic.ConvReLU2d` — Conv2d + ReLU
* :class:`~torch.nn.intrinsic.ConvReLU3d` — Conv3d + ReLU
* :class:`~torch.nn.intrinsic.LinearReLU`Linear + ReLU
* ``torch.nn.intrinsic.qat`` — versions of layers for quantization-aware training
* :class:`~torch.nn.intrinsic.qat.ConvBn2d` — Conv2d + BatchNorm
* :class:`~torch.nn.intrinsic.qat.ConvBnReLU2d` — Conv2d + BatchNorm + ReLU
* :class:`~torch.nn.intrinsic.qat.ConvReLU2d` — Conv2d + ReLU
* :class:`~torch.nn.intrinsic.qat.LinearReLU` — Linear + ReLU
* ``torch.nn.intrinsic.quantized`` — quantized version of fused layers for inference (no BatchNorm variants as it's usually folded into convolution for inference)
* :class:`~torch.nn.intrinsic.quantized.LinearReLU`Linear + ReLU
* :class:`~torch.nn.intrinsic.quantized.ConvReLU2d`2D Convolution + ReLU
* :class:`~torch.nn.intrinsic.quantized.ConvReLU3d` — 3D Convolution + ReLU
* ``torch.nn.intrinsic`` — float versions of the modules, can be swapped with
quantized version 1 to 1:
* :class:`~torch.nn.intrinsic.ConvBn2d` — Conv2d + BatchNorm
* :class:`~torch.nn.intrinsic.ConvBnReLU2d` — Conv2d + BatchNorm + ReLU
* :class:`~torch.nn.intrinsic.ConvReLU2d`Conv2d + ReLU
* :class:`~torch.nn.intrinsic.ConvReLU3d` — Conv3d + ReLU
* :class:`~torch.nn.intrinsic.LinearReLU` — Linear + ReLU
* ``torch.nn.intrinsic.qat`` — versions of layers for quantization-aware training:
* :class:`~torch.nn.intrinsic.qat.ConvBn2d` — Conv2d + BatchNorm
* :class:`~torch.nn.intrinsic.qat.ConvBnReLU2d` — Conv2d + BatchNorm + ReLU
* :class:`~torch.nn.intrinsic.qat.ConvReLU2d`Conv2d + ReLU
* :class:`~torch.nn.intrinsic.qat.LinearReLU`Linear + ReLU
* ``torch.nn.intrinsic.quantized`` — quantized version of fused layers for
inference (no BatchNorm variants as it's usually folded into convolution for
inference):
* :class:`~torch.nn.intrinsic.quantized.LinearReLU` — Linear + ReLU
* :class:`~torch.nn.intrinsic.quantized.ConvReLU2d` — 2D Convolution + ReLU
* :class:`~torch.nn.intrinsic.quantized.ConvReLU3d` — 3D Convolution + ReLU
``torch.nn.qat``
~~~~~~~~~~~~~~~~
@ -204,62 +237,111 @@ Layers for the quantization-aware training
``torch.quantization``
~~~~~~~~~~~~~~~~~~~~~~
* Functions for quantization
* :func:`~torch.quantization.add_observer_` — Adds observer for the leaf modules (if quantization configuration is provided)
* :func:`~torch.quantization.add_quant_dequant`Wraps the leaf child module using :class:`~torch.quantization.QuantWrapper`
* :func:`~torch.quantization.convert` — Converts float module with observers into its quantized counterpart. Must have quantization configuration
* :func:`~torch.quantization.get_observer_dict` Traverses the module children and collects all observers into a ``dict``
* :func:`~torch.quantization.prepare` — Prepares a copy of a model for quantization
* :func:`~torch.quantization.prepare_qat` — Prepares a copy of a model for quantization aware training
* :func:`~torch.quantization.propagate_qconfig_` — Propagates quantization configurations through the module hierarchy and assign them to each leaf module
* :func:`~torch.quantization.quantize` — Converts a float module to quantized version
* :func:`~torch.quantization.quantize_dynamic` — Converts a float module to dynamically quantized version
* :func:`~torch.quantization.quantize_qat`Converts a float module to quantized version used in quantization aware training
* :func:`~torch.quantization.swap_module` — Swaps the module with its quantized counterpart (if quantizable and if it has an observer)
* :func:`~torch.quantization.default_eval_fn` — Default evaluation function used by the :func:`torch.quantization.quantize`
* Functions for quantization:
* :func:`~torch.quantization.add_observer_` Adds observer for the leaf
modules (if quantization configuration is provided)
* :func:`~torch.quantization.add_quant_dequant`Wraps the leaf child module using :class:`~torch.quantization.QuantWrapper`
* :func:`~torch.quantization.convert` — Converts float module with
observers into its quantized counterpart. Must have quantization
configuration
* :func:`~torch.quantization.get_observer_dict` — Traverses the module
children and collects all observers into a ``dict``
* :func:`~torch.quantization.prepare` Prepares a copy of a model for
quantization
* :func:`~torch.quantization.prepare_qat` — Prepares a copy of a model for
quantization aware training
* :func:`~torch.quantization.propagate_qconfig_` — Propagates quantization
configurations through the module hierarchy and assign them to each leaf
module
* :func:`~torch.quantization.quantize` — Converts a float module to quantized version
* :func:`~torch.quantization.quantize_dynamic` — Converts a float module to
dynamically quantized version
* :func:`~torch.quantization.quantize_qat` — Converts a float module to
quantized version used in quantization aware training
* :func:`~torch.quantization.swap_module` — Swaps the module with its
quantized counterpart (if quantizable and if it has an observer)
* :func:`~torch.quantization.default_eval_fn` — Default evaluation function
used by the :func:`torch.quantization.quantize`
* :func:`~torch.quantization.fuse_modules`
* :class:`~torch.quantization.FakeQuantize` — Module for simulating the quantization/dequantization at training time
* Default Observers. The rest of observers are available from ``torch.quantization.observer``
* :attr:`~torch.quantization.default_observer` — Same as ``MinMaxObserver.with_args(reduce_range=True)``
* :attr:`~torch.quantization.default_weight_observer` — Same as ``MinMaxObserver.with_args(dtype=torch.qint8, qscheme=torch.per_tensor_symmetric)``
* :class:`~torch.quantization.Observer`Abstract base class for observers
* :class:`~torch.quantization.FakeQuantize` — Module for simulating the
quantization/dequantization at training time
* Default Observers. The rest of observers are available from
``torch.quantization.observer``:
* :attr:`~torch.quantization.default_observer`Same as ``MinMaxObserver.with_args(reduce_range=True)``
* :attr:`~torch.quantization.default_weight_observer` — Same as ``MinMaxObserver.with_args(dtype=torch.qint8, qscheme=torch.per_tensor_symmetric)``
* :class:`~torch.quantization.Observer` — Abstract base class for observers
* Quantization configurations
* :class:`~torch.quantization.QConfig` — Quantization configuration class
* :attr:`~torch.quantization.default_qconfig` — Same as ``QConfig(activation=default_observer, weight=default_weight_observer)`` (See :class:`~torch.quantization.qconfig.QConfig`)
* :attr:`~torch.quantization.default_qat_qconfig` — Same as ``QConfig(activation=default_fake_quant, weight=default_weight_fake_quant)`` (See :class:`~torch.quantization.qconfig.QConfig`)
* :attr:`~torch.quantization.default_dynamic_qconfig` — Same as ``QConfigDynamic(weight=default_weight_observer)`` (See :class:`~torch.quantization.qconfig.QConfigDynamic`)
* :attr:`~torch.quantization.float16_dynamic_qconfig` — Same as ``QConfigDynamic(weight=NoopObserver.with_args(dtype=torch.float16))`` (See :class:`~torch.quantization.qconfig.QConfigDynamic`)
* :attr:`~torch.quantization.default_qconfig` — Same as
``QConfig(activation=default_observer, weight=default_weight_observer)``
(See :class:`~torch.quantization.qconfig.QConfig`)
* :attr:`~torch.quantization.default_qat_qconfig` — Same as
``QConfig(activation=default_fake_quant,
weight=default_weight_fake_quant)`` (See
:class:`~torch.quantization.qconfig.QConfig`)
* :attr:`~torch.quantization.default_dynamic_qconfig` — Same as
``QConfigDynamic(weight=default_weight_observer)`` (See
:class:`~torch.quantization.qconfig.QConfigDynamic`)
* :attr:`~torch.quantization.float16_dynamic_qconfig` — Same as
``QConfigDynamic(weight=NoopObserver.with_args(dtype=torch.float16))``
(See :class:`~torch.quantization.qconfig.QConfigDynamic`)
* Stubs
* :class:`~torch.quantization.DeQuantStub` - placeholder module for dequantize() operation in float-valued models
* :class:`~torch.quantization.QuantStub` - placeholder module for quantize() operation in float-valued models
* :class:`~torch.quantization.QuantWrapper` — wraps the module to be quantized. Inserts the :class:`~torch.quantization.QuantStub` and :class:`~torch.quantization.DeQuantStub`
* :class:`~torch.quantization.DeQuantStub` - placeholder module for
dequantize() operation in float-valued models
* :class:`~torch.quantization.QuantStub` - placeholder module for
quantize() operation in float-valued models
* :class:`~torch.quantization.QuantWrapper` — wraps the module to be
quantized. Inserts the :class:`~torch.quantization.QuantStub` and
* :class:`~torch.quantization.DeQuantStub`
Observers for computing the quantization parameters
* :class:`~torch.quantization.MinMaxObserver` — Derives the quantization parameters from the running minimum and maximum of the observed tensor inputs (per tensor variant)
* :class:`~torch.quantization.MovingAverageMinMaxObserver` — Derives the quantization parameters from the running averages of the minimums and maximums of the observed tensor inputs (per tensor variant)
* :class:`~torch.quantization.PerChannelMinMaxObserver`— Derives the quantization parameters from the running minimum and maximum of the observed tensor inputs (per channel variant)
* :class:`~torch.quantization.MovingAveragePerChannelMinMaxObserver` — Derives the quantization parameters from the running averages of the minimums and maximums of the observed tensor inputs (per channel variant)
* :class:`~torch.quantization.HistogramObserver` — Derives the quantization parameters by creating a histogram of running minimums and maximums.
* :class:`~torch.quantization.MinMaxObserver` — Derives the quantization
parameters from the running minimum and maximum of the observed tensor inputs
(per tensor variant)
* :class:`~torch.quantization.MovingAverageMinMaxObserver` — Derives the
quantization parameters from the running averages of the minimums and
maximums of the observed tensor inputs (per tensor variant)
* :class:`~torch.quantization.PerChannelMinMaxObserver` — Derives the
quantization parameters from the running minimum and maximum of the observed
tensor inputs (per channel variant)
* :class:`~torch.quantization.MovingAveragePerChannelMinMaxObserver` — Derives
the quantization parameters from the running averages of the minimums and
maximums of the observed tensor inputs (per channel variant)
* :class:`~torch.quantization.HistogramObserver` — Derives the quantization
parameters by creating a histogram of running minimums and maximums.
* Observers that do not compute the quantization parameters:
* :class:`~torch.quantization.RecordingObserver` — Records all incoming tensors. Used for debugging only.
* :class:`~torch.quantization.NoopObserver` — Pass-through observer. Used for situation when there are no quantization parameters (i.e. quantization to ``float16``)
* :class:`~torch.quantization.RecordingObserver` — Records all incoming
tensors. Used for debugging only.
* :class:`~torch.quantization.NoopObserver` — Pass-through observer. Used
for situation when there are no quantization parameters (i.e.
quantization to ``float16``)
``torch.nn.quantized``
~~~~~~~~~~~~~~~~~~~~~~
Quantized version of standard NN layers.
* :class:`~torch.nn.quantized.Quantize` — Quantization layer, used to automatically replace :class:`~torch.quantization.QuantStub`
* :class:`~torch.nn.quantized.DeQuantize` — Dequantization layer, used to replace :class:`~torch.quantization.DeQuantStub`
* :class:`~torch.nn.quantized.FloatFunctional` — Wrapper class to make stateless float operations stateful so that they can be replaced with quantized versions
* :class:`~torch.nn.quantized.QFunctional` — Wrapper class for quantized versions of stateless operations like ``torch.add``
* :class:`~torch.nn.quantized.Quantize` — Quantization layer, used to
automatically replace :class:`~torch.quantization.QuantStub`
* :class:`~torch.nn.quantized.DeQuantize` — Dequantization layer, used to
replace :class:`~torch.quantization.DeQuantStub`
* :class:`~torch.nn.quantized.FloatFunctional` — Wrapper class to make
stateless float operations stateful so that they can be replaced with
quantized versions
* :class:`~torch.nn.quantized.QFunctional` — Wrapper class for quantized
versions of stateless operations like ``torch.add``
* :class:`~torch.nn.quantized.Conv2d` — 2D convolution
* :class:`~torch.nn.quantized.Conv3d` — 3D convolution
* :class:`~torch.nn.quantized.Linear` — Linear (fully-connected) layer
* :class:`~torch.nn.MaxPool2d` — 2D max pooling
* :class:`~torch.nn.quantized.ReLU` — Rectified linear unit
* :class:`~torch.nn.quantized.ReLU6` — Rectified linear unit with cut-off at quantized representation of 6
* :class:`~torch.nn.quantized.ReLU6` — Rectified linear unit with cut-off at
quantized representation of 6
``torch.nn.quantized.dynamic``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -272,7 +354,8 @@ Layers used in dynamically quantized models (i.e. quantized only on weights)
``torch.nn.quantized.functional``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functional versions of quantized NN layers (many of them accept explicit quantization output parameters)
Functional versions of quantized NN layers (many of them accept explicit
quantization output parameters)
* :func:`~torch.nn.quantized.functional.adaptive_avg_pool2d` — 2D adaptive average pooling
* :func:`~torch.nn.quantized.functional.avg_pool2d` — 2D average pooling
@ -282,22 +365,31 @@ Functional versions of quantized NN layers (many of them accept explicit quantiz
* :func:`~torch.nn.quantized.functional.linear` — Linear (fully-connected) op
* :func:`~torch.nn.quantized.functional.max_pool2d` — 2D max pooling
* :func:`~torch.nn.quantized.functional.relu` — Rectified linear unit
* :func:`~torch.nn.quantized.functional.upsample` — Upsampler. Will be deprecated in favor of :func:`~torch.nn.quantized.functional.interpolate`
* :func:`~torch.nn.quantized.functional.upsample_bilinear` — Bilenear upsampler. Will be deprecated in favor of :func:`~torch.nn.quantized.functional.interpolate`
* :func:`~torch.nn.quantized.functional.upsample_nearest`Nearest neighbor upsampler. Will be deprecated in favor of :func:`~torch.nn.quantized.functional.interpolate`
* :func:`~torch.nn.quantized.functional.upsample` — Upsampler. Will be
deprecated in favor of :func:`~torch.nn.quantized.functional.interpolate`
* :func:`~torch.nn.quantized.functional.upsample_bilinear`Bilenear
upsampler. Will be deprecated in favor of
* :func:`~torch.nn.quantized.functional.interpolate`
* :func:`~torch.nn.quantized.functional.upsample_nearest` — Nearest neighbor
upsampler. Will be deprecated in favor of
* :func:`~torch.nn.quantized.functional.interpolate`
Quantized dtypes and quantization schemes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* :attr:`torch.qscheme` — Type to describe the quantization scheme of a tensor. Supported types:
* :attr:`torch.per_tensor_affine` — per tensor, asymmetric
* :attr:`torch.per_channel_affine` — per channel, asymmetric
* :attr:`torch.per_tensor_symmetric` — per tensor, symmetric
* :attr:`torch.per_channel_symmetric` — per tensor, symmetric
* :attr:`torch.qscheme` — Type to describe the quantization scheme of a tensor.
Supported types:
* :attr:`torch.per_tensor_affine` — per tensor, asymmetric
* :attr:`torch.per_channel_affine` — per channel, asymmetric
* :attr:`torch.per_tensor_symmetric` — per tensor, symmetric
* :attr:`torch.per_channel_symmetric` — per tensor, symmetric
* ``torch.dtype`` — Type to describe the data. Supported types:
* :attr:`torch.quint8` — 8-bit unsigned integer
* :attr:`torch.qint8` — 8-bit signed integer
* :attr:`torch.qint32`32-bit signed integer
* :attr:`torch.quint8` — 8-bit unsigned integer
* :attr:`torch.qint8`8-bit signed integer
* :attr:`torch.qint32` — 32-bit signed integer
@ -322,17 +414,21 @@ PyTorch provides three approaches to quantize models.
quantization where the weights are quantized ahead of time and the
scale factor and bias for the activation tensors is pre-computed
based on observing the behavior of the model during a calibration
process. Post Training Quantization is typically when both memory bandwidth and compute
savings are important with CNNs being a typical use case.
process. Post Training Quantization is typically when both memory bandwidth
and compute savings are important with CNNs being a typical use case.
The general process for doing post training quantization is:
1. Prepare the model:
a. Specify where the activations are quantized and dequantized explicitly by adding QuantStub and DeQuantStub modules.
a. Specify where the activations are quantized and dequantized explicitly
by adding QuantStub and DeQuantStub modules.
b. Ensure that modules are not reused.
c. Convert any operations that require requantization into modules
2. Fuse operations like conv + relu or conv+batchnorm + relu together to improve both model accuracy and performance.
2. Fuse operations like conv + relu or conv+batchnorm + relu together to
improve both model accuracy and performance.
3. Specify the configuration of the quantization methods \'97 such as
selecting symmetric or asymmetric quantization and MinMax or
@ -352,10 +448,10 @@ PyTorch provides three approaches to quantize models.
3. Quantization Aware Training: In the rare cases where post training
quantization does not provide adequate accuracy training can be done
with simulated quantization using the :class:`torch.quantization.FakeQuantize`. Computations
will take place in FP32 but with values clamped and rounded to
simulate the effects of INT8 quantization. The sequence of steps is
very similar.
with simulated quantization using the
:class:`torch.quantization.FakeQuantize`. Computations will take place in
FP32 but with values clamped and rounded to simulate the effects of INT8
quantization. The sequence of steps is very similar.
1. Steps (1) and (2) are identical.
@ -393,16 +489,25 @@ It is necessary to currently make some modifications to the model definition
prior to quantization. This is because currently quantization works on a module
by module basis. Specifically, for all quantization techniques, the user needs to:
1. Convert any operations that require output requantization (and thus have additional parameters) from functionals to module form.
2. Specify which parts of the model need to be quantized either by assigning ```.qconfig`` attributes on submodules or by specifying ``qconfig_dict``
1. Convert any operations that require output requantization (and thus have
additional parameters) from functionals to module form.
2. Specify which parts of the model need to be quantized either by assigning
```.qconfig`` attributes on submodules or by specifying ``qconfig_dict``
For static quantization techniques which quantize activations, the user needs to do the following in addition:
For static quantization techniques which quantize activations, the user needs
to do the following in addition:
1. Specify where activations are quantized and de-quantized. This is done using :class:`~torch.quantization.QuantStub` and :class:`~torch.quantization.DeQuantStub` modules.
2. Use :class:`torch.nn.quantized.FloatFunctional` to wrap tensor operations that require special handling for quantization into modules. Examples
are operations like ``add`` and ``cat`` which require special handling to determine output quantization parameters.
3. Fuse modules: combine operations/modules into a single module to obtain higher accuracy and performance. This is done using the
:func:`torch.quantization.fuse_modules` API, which takes in lists of modules to be fused. We currently support the following fusions:
1. Specify where activations are quantized and de-quantized. This is done using
:class:`~torch.quantization.QuantStub` and
:class:`~torch.quantization.DeQuantStub` modules.
2. Use :class:`torch.nn.quantized.FloatFunctional` to wrap tensor operations
that require special handling for quantization into modules. Examples
are operations like ``add`` and ``cat`` which require special handling to
determine output quantization parameters.
3. Fuse modules: combine operations/modules into a single module to obtain
higher accuracy and performance. This is done using the
:func:`torch.quantization.fuse_modules` API, which takes in lists of modules
to be fused. We currently support the following fusions:
[Conv, Relu], [Conv, BatchNorm], [Conv, BatchNorm, Relu], [Linear, Relu]
@ -450,7 +555,7 @@ Utility functions
Observers
~~~~~~~~~~~~~~~
.. autoclass:: Observer
.. autoclass:: ObserverBase
:members:
.. autoclass:: MinMaxObserver
.. autoclass:: MovingAverageMinMaxObserver
@ -468,7 +573,8 @@ Debugging utilities
torch.nn.intrinsic
--------------------------------
This module implements the combined (fused) modules conv + relu which can be then quantized.
This module implements the combined (fused) modules conv + relu which can be
then quantized.
.. automodule:: torch.nn.intrinsic
@ -500,7 +606,8 @@ LinearReLU
torch.nn.instrinsic.qat
--------------------------------
This module implements the versions of those fused operations needed for quantization aware training.
This module implements the versions of those fused operations needed for
quantization aware training.
.. automodule:: torch.nn.intrinsic.qat
@ -549,8 +656,9 @@ LinearReLU
torch.nn.qat
---------------------------
This module implements versions of the key nn modules **Conv2d()** and **Linear()** which
run in FP32 but with rounding applied to simulate the effect of INT8 quantization.
This module implements versions of the key nn modules **Conv2d()** and
**Linear()** which run in FP32 but with rounding applied to simulate the effect
of INT8 quantization.
.. automodule:: torch.nn.qat
@ -568,7 +676,8 @@ Linear
torch.nn.quantized
----------------------------
This module implements the quantized versions of the nn layers such as **Conv2d** and **ReLU**.
This module implements the quantized versions of the nn layers such as
~`torch.nn.Conv2d` and `torch.nn.ReLU`.
Functional interface
~~~~~~~~~~~~~~~~~~~~

View File

@ -1,5 +1,8 @@
:orphan:
.. contents:: :local:
:depth: 2
.. _distributed-rpc-framework:
Distributed RPC Framework
@ -130,6 +133,12 @@ details.
:members:
.. toctree::
:caption: More Information about RRef
rpc/rref
Distributed Autograd Framework
------------------------------
@ -143,8 +152,30 @@ using RPC. For more details see :ref:`distributed-autograd-design`.
.. automodule:: torch.distributed.autograd
:members: context, backward, get_gradients
.. toctree::
:caption: More Information about RPC Autograd
rpc/distributed_autograd
Distributed Optimizer
---------------------
.. automodule:: torch.distributed.optim
:members: DistributedOptimizer
Design Notes
------------
The distributed autograd design note covers the design of the RPC-based distributed autograd framework that is useful for applications such as model parallel training.
- :ref:`distributed-autograd-design`
The RRef design note covers the design of the :ref:`rref` (Remote REFerence) protocol used to refer to values on remote workers by the framework.
- :ref:`remote-reference-protocol`
Tutorials
---------
The RPC tutorial introduces users to the RPC framework and provides two example applications using :ref:`torch.distributed.rpc<distributed-rpc-framework>` APIs.
- `Getting started with Distributed RPC Framework <https://pytorch.org/tutorials/intermediate/rpc_tutorial.html>`__

View File

@ -1,25 +0,0 @@
.. _rpc-index:
Distributed RPC Framework
=========================
The distributed RPC framework provides mechanisms for multi-machine model training through a set of primitives to allow for remote communication, and a higher-level API to automatically differentiate models split across several machines.
- :ref:`distributed-rpc-framework`
Design Notes
------------
The distributed autograd design note covers the design of the RPC-based distributed autograd framework that is useful for applications such as model parallel training.
- :ref:`distributed-autograd-design`
The RRef design note covers the design of the :ref:`rref` (Remote REFerence) protocol used to refer to values on remote workers by the framework.
- :ref:`remote-reference-protocol`
Tutorials
---------
The RPC tutorial introduces users to the RPC framework and provides two example applications using :ref:`torch.distributed.rpc<distributed-rpc-framework>` APIs.
- `Getting started with Distributed RPC Framework <https://pytorch.org/tutorials/intermediate/rpc_tutorial.html>`__
- `Implementing a Parameter Server using Distributed RPC Framework <https://pytorch.org/tutorials/intermediate/rpc_param_server_tutorial.html>`__

View File

@ -223,7 +223,7 @@ Example::
For more information on ``torch.sparse_coo`` tensors, see :ref:`sparse-docs`.
torch.memory_format
------------
-------------------
.. class:: torch.memory_format

View File

@ -1,19 +1,31 @@
torch
===================================
.. automodule:: torch
=====
The torch package contains data structures for multi-dimensional
tensors and mathematical operations over these are defined.
Additionally, it provides many utilities for efficient serializing of
Tensors and arbitrary types, and other useful utilities.
It has a CUDA counterpart, that enables you to run your tensor computations
on an NVIDIA GPU with compute capability >= 3.0
.. currentmodule:: torch
Tensors
----------------------------------
.. autofunction:: is_tensor
.. autofunction:: is_storage
.. autofunction:: is_complex
.. autofunction:: is_floating_point
.. autofunction:: set_default_dtype
.. autofunction:: get_default_dtype
.. autofunction:: set_default_tensor_type
.. autofunction:: numel
.. autofunction:: set_printoptions
.. autofunction:: set_flush_denormal
-------
.. autosummary::
:toctree: generated
:nosignatures:
is_tensor
is_storage
is_complex
is_floating_point
set_default_dtype
get_default_dtype
set_default_tensor_type
numel
set_printoptions
set_flush_denormal
.. _tensor-creation-ops:
@ -34,65 +46,81 @@ Creation Ops
methods to create :class:`torch.Tensor` s with values sampled from a broader
range of distributions.
.. autofunction:: tensor
.. autofunction:: sparse_coo_tensor
.. autofunction:: as_tensor
.. autofunction:: as_strided
.. autofunction:: from_numpy
.. autofunction:: zeros
.. autofunction:: zeros_like
.. autofunction:: ones
.. autofunction:: ones_like
.. autofunction:: arange
.. autofunction:: range
.. autofunction:: linspace
.. autofunction:: logspace
.. autofunction:: eye
.. autofunction:: empty
.. autofunction:: empty_like
.. autofunction:: empty_strided
.. autofunction:: full
.. autofunction:: full_like
.. autofunction:: quantize_per_tensor
.. autofunction:: quantize_per_channel
.. autofunction:: dequantize
.. autosummary::
:toctree: generated
:nosignatures:
tensor
sparse_coo_tensor
as_tensor
as_strided
from_numpy
zeros
zeros_like
ones
ones_like
arange
range
linspace
logspace
eye
empty
empty_like
empty_strided
full
full_like
quantize_per_tensor
quantize_per_channel
dequantize
Indexing, Slicing, Joining, Mutating Ops
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: cat
.. autofunction:: chunk
.. autofunction:: gather
.. autofunction:: index_select
.. autofunction:: masked_select
.. autofunction:: narrow
.. autofunction:: nonzero
.. autofunction:: reshape
.. autofunction:: split
.. autofunction:: squeeze
.. autofunction:: stack
.. autofunction:: t
.. autofunction:: take
.. autofunction:: transpose
.. autofunction:: unbind
.. autofunction:: unsqueeze
.. autofunction:: where
.. autosummary::
:toctree: generated
:nosignatures:
cat
chunk
gather
index_select
masked_select
narrow
nonzero
reshape
split
squeeze
stack
t
take
transpose
unbind
unsqueeze
where
.. _generators:
Generators
----------------------------------
.. autoclass:: torch._C.Generator
:members:
.. autosummary::
:toctree: generated
:nosignatures:
_C.Generator
.. _random-sampling:
Random sampling
----------------------------------
.. autofunction:: seed
.. autofunction:: manual_seed
.. autofunction:: initial_seed
.. autofunction:: get_rng_state
.. autofunction:: set_rng_state
.. autosummary::
:toctree: generated
:nosignatures:
seed
manual_seed
initial_seed
get_rng_state
set_rng_state
.. autoattribute:: torch.default_generator
:annotation: Returns the default CPU torch.Generator
@ -102,17 +130,21 @@ Random sampling
:annotation: If cuda is available, returns a tuple of default CUDA torch.Generator-s.
The number of CUDA torch.Generator-s returned is equal to the number of
GPUs available in the system.
.. autofunction:: bernoulli
.. autofunction:: multinomial
.. autofunction:: normal
.. autofunction:: poisson
.. autofunction:: rand
.. autofunction:: rand_like
.. autofunction:: randint
.. autofunction:: randint_like
.. autofunction:: randn
.. autofunction:: randn_like
.. autofunction:: randperm
.. autosummary::
:toctree: generated
:nosignatures:
bernoulli
multinomial
normal
poisson
rand
rand_like
randint
randint_like
randn
randn_like
randperm
.. _inplace-random-sampling:
@ -132,24 +164,32 @@ There are a few more in-place random sampling functions defined on Tensors as we
Quasi-random sampling
~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated
:nosignatures:
:template: sobolengine.rst
.. autoclass:: torch.quasirandom.SobolEngine
:members:
:exclude-members: MAXBIT, MAXDIM
:undoc-members:
quasirandom.SobolEngine
Serialization
----------------------------------
.. autofunction:: save
.. autofunction:: load
.. autosummary::
:toctree: generated
:nosignatures:
save
load
Parallelism
----------------------------------
.. autofunction:: get_num_threads
.. autofunction:: set_num_threads
.. autofunction:: get_num_interop_threads
.. autofunction:: set_num_interop_threads
.. autosummary::
:toctree: generated
:nosignatures:
get_num_threads
set_num_threads
get_num_interop_threads
set_num_interop_threads
Locally disabling gradient computation
--------------------------------------
@ -183,217 +223,247 @@ Examples::
>>> y.requires_grad
False
.. autofunction:: no_grad
.. autofunction:: enable_grad
.. autofunction:: set_grad_enabled
.. autosummary::
:toctree: generated
:nosignatures:
no_grad
enable_grad
set_grad_enabled
Math operations
----------------------------------
---------------
Pointwise Ops
~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: abs
.. autofunction:: absolute
.. autofunction:: acos
.. autofunction:: add
.. autofunction:: addcdiv
.. autofunction:: addcmul
.. autofunction:: angle
.. autofunction:: asin
.. autofunction:: atan
.. autofunction:: atan2
.. autofunction:: bitwise_not
.. autofunction:: bitwise_and
.. autofunction:: bitwise_or
.. autofunction:: bitwise_xor
.. autofunction:: ceil
.. autofunction:: clamp
.. autofunction:: conj
.. autofunction:: cos
.. autofunction:: cosh
.. autofunction:: div
.. autofunction:: digamma
.. autofunction:: erf
.. autofunction:: erfc
.. autofunction:: erfinv
.. autofunction:: exp
.. autofunction:: expm1
.. autofunction:: floor
.. autofunction:: floor_divide
.. autofunction:: fmod
.. autofunction:: frac
.. autofunction:: imag
.. autofunction:: lerp
.. autofunction:: lgamma
.. autofunction:: log
.. autofunction:: log10
.. autofunction:: log1p
.. autofunction:: log2
.. autofunction:: logical_and
.. autofunction:: logical_not
.. autofunction:: logical_or
.. autofunction:: logical_xor
.. autofunction:: mul
.. autofunction:: mvlgamma
.. autofunction:: neg
.. autofunction:: polygamma
.. autofunction:: pow
.. autofunction:: real
.. autofunction:: reciprocal
.. autofunction:: remainder
.. autofunction:: round
.. autofunction:: rsqrt
.. autofunction:: sigmoid
.. autofunction:: sign
.. autofunction:: sin
.. autofunction:: sinh
.. autofunction:: sqrt
.. autofunction:: square
.. autofunction:: tan
.. autofunction:: tanh
.. autofunction:: true_divide
.. autofunction:: trunc
.. autosummary::
:toctree: generated
:nosignatures:
abs
absolute
acos
add
addcdiv
addcmul
angle
asin
atan
atan2
bitwise_not
bitwise_and
bitwise_or
bitwise_xor
ceil
clamp
conj
cos
cosh
div
digamma
erf
erfc
erfinv
exp
expm1
floor
floor_divide
fmod
frac
imag
lerp
lgamma
log
log10
log1p
log2
logical_and
logical_not
logical_or
logical_xor
mul
mvlgamma
neg
polygamma
pow
real
reciprocal
remainder
round
rsqrt
sigmoid
sign
sin
sinh
sqrt
square
tan
tanh
true_divide
trunc
Reduction Ops
~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: argmax
.. autofunction:: argmin
.. autofunction:: dist
.. autofunction:: logsumexp
.. autofunction:: mean
.. autofunction:: median
.. autofunction:: mode
.. autofunction:: norm
.. autofunction:: prod
.. autofunction:: std
.. autofunction:: std_mean
.. autofunction:: sum
.. autofunction:: unique
.. autofunction:: unique_consecutive
.. autofunction:: var
.. autofunction:: var_mean
.. autosummary::
:toctree: generated
:nosignatures:
argmax
argmin
dist
logsumexp
mean
median
mode
norm
prod
std
std_mean
sum
unique
unique_consecutive
var
var_mean
Comparison Ops
~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: allclose
.. autofunction:: argsort
.. autofunction:: eq
.. autofunction:: equal
.. autofunction:: ge
.. autofunction:: gt
.. autofunction:: isclose
.. autofunction:: isfinite
.. autofunction:: isinf
.. autofunction:: isnan
.. autofunction:: kthvalue
.. autofunction:: le
.. autofunction:: lt
.. autofunction:: max
.. autofunction:: min
.. autofunction:: ne
.. autofunction:: sort
.. autofunction:: topk
.. autosummary::
:toctree: generated
:nosignatures:
allclose
argsort
eq
equal
ge
gt
isclose
isfinite
isinf
isnan
kthvalue
le
lt
max
min
ne
sort
topk
Spectral Ops
~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: fft
.. autofunction:: ifft
.. autofunction:: rfft
.. autofunction:: irfft
.. autofunction:: stft
.. autofunction:: istft
.. autofunction:: bartlett_window
.. autofunction:: blackman_window
.. autofunction:: hamming_window
.. autofunction:: hann_window
.. autosummary::
:toctree: generated
:nosignatures:
fft
ifft
rfft
irfft
stft
istft
bartlett_window
blackman_window
hamming_window
hann_window
Other Operations
~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: bincount
.. autofunction:: block_diag
.. autofunction:: broadcast_tensors
.. autofunction:: cartesian_prod
.. autofunction:: cdist
.. autofunction:: combinations
.. autofunction:: cross
.. autofunction:: cummax
.. autofunction:: cummin
.. autofunction:: cumprod
.. autofunction:: cumsum
.. autofunction:: diag
.. autofunction:: diag_embed
.. autofunction:: diagflat
.. autofunction:: diagonal
.. autofunction:: einsum
.. autofunction:: flatten
.. autofunction:: flip
.. autofunction:: rot90
.. autofunction:: histc
.. autofunction:: meshgrid
.. autofunction:: renorm
.. autofunction:: repeat_interleave
.. autofunction:: roll
.. autofunction:: tensordot
.. autofunction:: trace
.. autofunction:: tril
.. autofunction:: tril_indices
.. autofunction:: triu
.. autofunction:: triu_indices
.. autofunction:: vander
.. autosummary::
:toctree: generated
:nosignatures:
bincount
block_diag
broadcast_tensors
cartesian_prod
cdist
combinations
cross
cummax
cummin
cumprod
cumsum
diag
diag_embed
diagflat
diagonal
einsum
flatten
flip
rot90
histc
meshgrid
renorm
repeat_interleave
roll
tensordot
trace
tril
tril_indices
triu
triu_indices
vander
BLAS and LAPACK Operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated
:nosignatures:
.. autofunction:: addbmm
.. autofunction:: addmm
.. autofunction:: addmv
.. autofunction:: addr
.. autofunction:: baddbmm
.. autofunction:: bmm
.. autofunction:: chain_matmul
.. autofunction:: cholesky
.. autofunction:: cholesky_inverse
.. autofunction:: cholesky_solve
.. autofunction:: dot
.. autofunction:: eig
.. autofunction:: geqrf
.. autofunction:: ger
.. autofunction:: inverse
.. autofunction:: det
.. autofunction:: logdet
.. autofunction:: slogdet
.. autofunction:: lstsq
.. autofunction:: lu
.. autofunction:: lu_solve
.. autofunction:: lu_unpack
.. autofunction:: matmul
.. autofunction:: matrix_power
.. autofunction:: matrix_rank
.. autofunction:: mm
.. autofunction:: mv
.. autofunction:: orgqr
.. autofunction:: ormqr
.. autofunction:: pinverse
.. autofunction:: qr
.. autofunction:: solve
.. autofunction:: svd
.. autofunction:: svd_lowrank
.. autofunction:: pca_lowrank
.. autofunction:: symeig
.. autofunction:: lobpcg
.. autofunction:: trapz
.. autofunction:: triangular_solve
addbmm
addmm
addmv
addr
baddbmm
bmm
chain_matmul
cholesky
cholesky_inverse
cholesky_solve
dot
eig
geqrf
ger
inverse
det
logdet
slogdet
lstsq
lu
lu_solve
lu_unpack
matmul
matrix_power
matrix_rank
mm
mv
orgqr
ormqr
pinverse
qr
solve
svd
svd_lowrank
pca_lowrank
symeig
lobpcg
trapz
triangular_solve
Utilities
----------------------------------
.. autofunction:: compiled_with_cxx11_abi
.. autofunction:: result_type
.. autofunction:: can_cast
.. autofunction:: promote_types
.. autosummary::
:toctree: generated
:nosignatures:
compiled_with_cxx11_abi
result_type
can_cast
promote_types

View File

@ -31,7 +31,11 @@ class TestDocCoverage(unittest.TestCase):
ret.add(name[0])
return ret
def test_torch(self):
def _test_torch(self):
# TODO: this test was disabled as part of PR gh-37419, since the
# simplistic test no longer works. It should be replaced, perhaps
# with a test based on sphinx.ext.coverage
# TODO: The algorithm here is kind of unsound; we don't assume
# every identifier in torch.rst lives in torch by virtue of
# where it lives; instead, it lives in torch because at the
@ -86,7 +90,11 @@ class TestDocCoverage(unittest.TestCase):
don't want to document?''')
)
def test_tensor(self):
def _test_tensor(self):
# TODO: this test was disabled as part of PR gh-37419, since the
# simplistic test no longer works. It should be replaced, perhaps
# with a test based on sphinx.ext.coverage
in_rst = self.parse_rst('tensors.rst', r2)
whitelist = {
'names', 'unflatten', 'align_as', 'rename_', 'refine_names', 'align_to',

View File

@ -331,6 +331,7 @@ for name in dir(_C._VariableFunctions):
if name.startswith('__'):
continue
globals()[name] = getattr(_C._VariableFunctions, name)
__all__.append(name)
################################################################################
# Import interface functions defined in Python

View File

@ -149,18 +149,18 @@ def lobpcg(A, # type: Tensor
Preconditioned Eigensolver: Locally Optimal Block Preconditioned
Conjugate Gradient Method. SIAM J. Sci. Comput., 23(2),
517-541. (25 pages)
`https://epubs.siam.org/doi/abs/10.1137/S1064827500366124`_
https://epubs.siam.org/doi/abs/10.1137/S1064827500366124
[StathopoulosEtal2002] Andreas Stathopoulos and Kesheng
Wu. (2002) A Block Orthogonalization Procedure with Constant
Synchronization Requirements. SIAM J. Sci. Comput., 23(6),
2165-2182. (18 pages)
`https://epubs.siam.org/doi/10.1137/S1064827500370883`_
https://epubs.siam.org/doi/10.1137/S1064827500370883
[DuerschEtal2018] Jed A. Duersch, Meiyue Shao, Chao Yang, Ming
Gu. (2018) A Robust and Efficient Implementation of LOBPCG.
SIAM J. Sci. Comput., 40(5), C655-C676. (22 pages)
`https://epubs.siam.org/doi/abs/10.1137/17M1129830`_
https://epubs.siam.org/doi/abs/10.1137/17M1129830
"""

View File

@ -84,7 +84,7 @@ def get_approximate_basis(A, # type: Tensor
def svd_lowrank(A, q=6, niter=2, M=None):
# type: (Tensor, Optional[int], Optional[int], Optional[Tensor]) -> Tuple[Tensor, Tensor, Tensor]
"""Return the singular value decomposition ``(U, S, V)`` of a matrix,
r"""Return the singular value decomposition ``(U, S, V)`` of a matrix,
batches of matrices, or a sparse matrix :math:`A` such that
:math:`A \approx U diag(S) V^T`. In case :math:`M` is given, then
SVD is computed for the matrix :math:`A - M`.

View File

@ -1384,7 +1384,13 @@ The :attr:`dim`\ th dimension of :attr:`tensor` must have the same size as the
length of :attr:`index` (which must be a vector), and all other dimensions must
match :attr:`self`, or an error will be raised.
.. include:: cuda_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
dim (int): dimension along which to index
@ -2511,7 +2517,13 @@ dimensions. It is also required that ``index.size(d) <= src.size(d)`` for all
dimensions ``d``, and that ``index.size(d) <= self.size(d)`` for all dimensions
``d != dim``.
.. include:: cuda_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
dim (int): the axis along which to index

View File

@ -154,7 +154,7 @@ Example::
add_docstr(torch.add,
r"""
.. function:: add(input, other, out=None)
add(input, other, out=None)
Adds the scalar :attr:`other` to each element of the input :attr:`input`
and returns a new resulting tensor.
@ -770,7 +770,13 @@ tensor of size 0. If :attr:`minlength` is specified, the number of bins is at le
``out[n] += weights[i]`` if :attr:`weights` is specified else
``out[n] += 1``.
.. include:: cuda_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Arguments:
input (Tensor): 1-d int tensor
@ -1529,7 +1535,7 @@ Example::
add_docstr(torch.dequantize,
r"""
.. function:: dequantize(tensor) -> Tensor
dequantize(tensor) -> Tensor
Given a quantized Tensor, dequantize it and return an fp32 Tensor
@ -1819,7 +1825,7 @@ Example::
add_docstr(torch.div,
r"""
.. function:: div(input, other, out=None) -> Tensor
div(input, other, out=None) -> Tensor
Divides each element of the input ``input`` with the scalar ``other`` and
returns a new resulting tensor.
@ -3337,7 +3343,7 @@ Example::
add_docstr(torch.max,
r"""
.. function:: max(input) -> Tensor
max(input) -> Tensor
Returns the maximum value of all elements in the ``input`` tensor.
@ -3424,7 +3430,7 @@ Example::
add_docstr(torch.argmax,
r"""
.. function:: argmax(input) -> LongTensor
argmax(input) -> LongTensor
Returns the indices of the maximum value of all elements in the :attr:`input` tensor.
@ -3471,7 +3477,7 @@ Example::
add_docstr(torch.mean,
r"""
.. function:: mean(input) -> Tensor
mean(input) -> Tensor
Returns the mean value of all elements in the :attr:`input` tensor.
@ -3519,7 +3525,7 @@ Example::
add_docstr(torch.median,
r"""
.. function:: median(input) -> Tensor
median(input) -> Tensor
Returns the median value of all elements in the :attr:`input` tensor.
@ -3578,7 +3584,7 @@ Example::
add_docstr(torch.min,
r"""
.. function:: min(input) -> Tensor
min(input) -> Tensor
Returns the minimum value of all elements in the :attr:`input` tensor.
@ -3666,7 +3672,7 @@ Example::
add_docstr(torch.argmin,
r"""
.. function:: argmin(input) -> LongTensor
argmin(input) -> LongTensor
Returns the indices of the minimum value of all elements in the :attr:`input` tensor.
@ -3837,7 +3843,7 @@ Example::
add_docstr(torch.mul,
r"""
.. function:: mul(input, other, out=None)
mul(input, other, out=None)
Multiplies each element of the input :attr:`input` with the scalar
:attr:`other` and returns a new resulting tensor.
@ -4149,7 +4155,7 @@ Example::
add_docstr(torch.normal,
r"""
.. function:: normal(mean, std, *, generator=None, out=None) -> Tensor
normal(mean, std, *, generator=None, out=None) -> Tensor
Returns a tensor of random numbers drawn from separate normal distributions
whose mean and standard deviation are given.
@ -4391,7 +4397,7 @@ Example::
add_docstr(torch.pow,
r"""
.. function:: pow(input, exponent, out=None) -> Tensor
pow(input, exponent, out=None) -> Tensor
Takes the power of each element in :attr:`input` with :attr:`exponent` and
returns a tensor with the result.
@ -4459,7 +4465,7 @@ Example::
add_docstr(torch.prod,
r"""
.. function:: prod(input, dtype=None) -> Tensor
prod(input, dtype=None) -> Tensor
Returns the product of all elements in the :attr:`input` tensor.
@ -4627,7 +4633,7 @@ Args:
add_docstr(torch.randint,
r"""
randint(low=0, high, size, *, generator=None, out=None, \
randint(low=0, high, size, \*, generator=None, out=None, \
dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
Returns a tensor filled with random integers generated uniformly
@ -5424,7 +5430,7 @@ Example::
add_docstr(torch.std,
r"""
.. function:: std(input, unbiased=True) -> Tensor
std(input, unbiased=True) -> Tensor
Returns the standard-deviation of all elements in the :attr:`input` tensor.
@ -5475,7 +5481,7 @@ Example::
add_docstr(torch.std_mean,
r"""
.. function:: std_mean(input, unbiased=True) -> (Tensor, Tensor)
std_mean(input, unbiased=True) -> (Tensor, Tensor)
Returns the standard-deviation and mean of all elements in the :attr:`input` tensor.
@ -5525,7 +5531,7 @@ Example::
add_docstr(torch.sum,
r"""
.. function:: sum(input, dtype=None) -> Tensor
sum(input, dtype=None) -> Tensor
Returns the sum of all elements in the :attr:`input` tensor.
@ -6358,7 +6364,7 @@ Example::
add_docstr(torch.var,
r"""
.. function:: var(input, unbiased=True) -> Tensor
var(input, unbiased=True) -> Tensor
Returns the variance of all elements in the :attr:`input` tensor.
@ -6409,7 +6415,7 @@ Example::
add_docstr(torch.var_mean,
r"""
.. function:: var_mean(input, unbiased=True) -> (Tensor, Tensor)
var_mean(input, unbiased=True) -> (Tensor, Tensor)
Returns the variance and mean of all elements in the :attr:`input` tensor.
@ -6685,7 +6691,7 @@ Example::
add_docstr(torch.where,
r"""
.. function:: where(condition, x, y) -> Tensor
where(condition, x, y) -> Tensor
Return a tensor of elements selected from either :attr:`x` or :attr:`y`, depending on :attr:`condition`.
@ -7485,7 +7491,7 @@ Example::
add_docstr(torch.trapz,
r"""
.. function:: trapz(y, x, *, dim=-1) -> Tensor
trapz(y, x, *, dim=-1) -> Tensor
Estimate :math:`\int y\,dx` along `dim`, using the trapezoid rule.
@ -7531,7 +7537,7 @@ Returns:
add_docstr(torch.repeat_interleave,
r"""
.. function:: repeat_interleave(input, repeats, dim=None) -> Tensor
repeat_interleave(input, repeats, dim=None) -> Tensor
Repeat elements of a tensor.

View File

@ -190,32 +190,33 @@ def _fill_in_zeros(grads, refs, strict, create_graph, stage):
# Public API
def vjp(func, inputs, v=None, create_graph=False, strict=False):
r"""Function that computes the dot product between a vector ``v`` and the Jacobian of
the given function at the point given by the inputs.
r"""Function that computes the dot product between a vector ``v`` and the
Jacobian of the given function at the point given by the inputs.
Args:
func (function): a Python function that takes Tensor inputs and returns
a tuple of Tensors or a Tensor.
inputs (tuple of Tensors or Tensor): inputs to the function ``func``.
v (tuple of Tensors or Tensor): The vector for which the vector Jacobian product is computed.
Must be the same size as the output of ``func``. This argument is optional when
``func``'s output contains a single element and (if it is not provided) will be set as a Tensor
containing a single ``1``.
create_graph (bool, optional): If ``True``, both the output and result will be
computed in a differentiable way. Note that when ``strict`` is ``False``, the result can not
require gradients or be disconnected from the inputs.
Defaults to ``False``.
strict (bool, optional): If ``True``, an error will be raised when we detect that there exists an input
such that all the outputs are independent of it. If ``False``, we return a Tensor of zeros as the
v (tuple of Tensors or Tensor): The vector for which the vector
Jacobian product is computed. Must be the same size as the output
of ``func``. This argument is optional when the output of ``func``
contains a single element and (if it is not provided) will be set
as a Tensor containing a single ``1``.
create_graph (bool, optional): If ``True``, both the output and result
will be computed in a differentiable way. Note that when ``strict``
is ``False``, the result can not require gradients or be
disconnected from the inputs. Defaults to ``False``.
strict (bool, optional): If ``True``, an error will be raised when we
detect that there exists an input such that all the outputs are
independent of it. If ``False``, we return a Tensor of zeros as the
vjp for said inputs, which is the expected mathematical value.
Defaults to ``False``.
Returns:
func_output (tuple of Tensors or Tensor): output of ``func(inputs)``
vjp (tuple of Tensors or Tensor): result of the dot product with the same shape
as the inputs.
vjp (tuple of Tensors or Tensor): result of the dot product with
the same shape as the inputs.
Example::
Example:
>>> def exp_reducer(x):
... return x.exp().sum(dim=1)
@ -257,7 +258,8 @@ def vjp(func, inputs, v=None, create_graph=False, strict=False):
_validate_v(v, outputs, is_outputs_tuple)
else:
if len(outputs) != 1 or outputs[0].nelement() != 1:
raise RuntimeError("The vector v can only be None if the user-provided function returns "
raise RuntimeError("The vector v can only be None if the "
"user-provided function returns "
"a single Tensor with a single element.")
grad_res = _autograd_grad(outputs, inputs, v, create_graph=create_graph)
@ -279,25 +281,26 @@ def jvp(func, inputs, v=None, create_graph=False, strict=False):
func (function): a Python function that takes Tensor inputs and returns
a tuple of Tensors or a Tensor.
inputs (tuple of Tensors or Tensor): inputs to the function ``func``.
v (tuple of Tensors or Tensor): The vector for which the Jacobian vector product is computed. Must be the
same size as the input of ``func``. This argument is optional when
``func``'s input contains a single element and (if it is not provided) will be set as a Tensor
containing a single ``1``.
create_graph (bool, optional): If ``True``, both the output and result will be
computed in a differentiable way. Note that when ``strict`` is ``False``, the result can not
require gradients or be disconnected from the inputs.
Defaults to ``False``.
strict (bool, optional): If ``True``, an error will be raised when we detect that there exists an input
such that all the outputs are independent of it. If ``False``, we return a Tensor of zeros as the
v (tuple of Tensors or Tensor): The vector for which the Jacobian
vector product is computed. Must be the same size as the input of
``func``. This argument is optional when the input to ``func``
contains a single element and (if it is not provided) will be set
as a Tensor containing a single ``1``.
create_graph (bool, optional): If ``True``, both the output and result
will be computed in a differentiable way. Note that when ``strict``
is ``False``, the result can not require gradients or be
disconnected from the inputs. Defaults to ``False``.
strict (bool, optional): If ``True``, an error will be raised when we
detect that there exists an input such that all the outputs are
independent of it. If ``False``, we return a Tensor of zeros as the
jvp for said inputs, which is the expected mathematical value.
Defaults to ``False``.
Returns:
func_output (tuple of Tensors or Tensor): output of ``func(inputs)``
jvp (tuple of Tensors or Tensor): result of the dot product with the same shape
as the output.
jvp (tuple of Tensors or Tensor): result of the dot product with
the same shape as the output.
Example::
Example:
>>> def exp_reducer(x):
... return x.exp().sum(dim=1)
@ -319,10 +322,10 @@ def jvp(func, inputs, v=None, create_graph=False, strict=False):
(tensor([2.2399, 2.5005]),
tensor([5., 5.]))
Note::
The jvp is currently computed by using the backward of the backward (sometimes called the double
backwards trick) as we don't have support for forward mode AD in PyTorch at the moment.
Note:
The jvp is currently computed by using the backward of the backward
(sometimes called the double backwards trick) as we don't have support
for forward mode AD in PyTorch at the moment.
"""
is_inputs_tuple, inputs = _as_tuple(inputs, "inputs", "jvp")
@ -334,14 +337,16 @@ def jvp(func, inputs, v=None, create_graph=False, strict=False):
_validate_v(v, inputs, is_inputs_tuple)
else:
if len(inputs) != 1 or inputs[0].nelement() != 1:
raise RuntimeError("The vector v can only be None if the input to the user-provided function "
"is a single Tensor with a single element.")
raise RuntimeError("The vector v can only be None if the input to "
"the user-provided function is a single Tensor "
"with a single element.")
outputs = func(*inputs)
is_outputs_tuple, outputs = _as_tuple(outputs, "outputs of the user-provided function", "jvp")
_check_requires_grad(outputs, "outputs", strict=strict)
# The backward is linear so the value of grad_outputs is not important as it won't appear in the double
# backward graph. We only need to ensure that it does not contain inf or nan.
# The backward is linear so the value of grad_outputs is not important as
# it won't appear in the double backward graph. We only need to ensure that
# it does not contain inf or nan.
grad_outputs = tuple(torch.zeros_like(out, requires_grad=True) for out in outputs)
grad_inputs = _autograd_grad(outputs, inputs, grad_outputs, create_graph=True)
@ -365,25 +370,28 @@ def jacobian(func, inputs, create_graph=False, strict=False):
func (function): a Python function that takes Tensor inputs and returns
a tuple of Tensors or a Tensor.
inputs (tuple of Tensors or Tensor): inputs to the function ``func``.
create_graph (bool, optional): If ``True``, the Jacobian will be computed in
a differentiable manner. Note that when ``strict`` is ``False``, the result can not
require gradients or be disconnected from the inputs.
Defaults to ``False``.
strict (bool, optional): If ``True``, an error will be raised when we detect that there exists an input
such that all the outputs are independent of it. If ``False``, we return a Tensor of zeros as the
create_graph (bool, optional): If ``True``, the Jacobian will be
computed in a differentiable manner. Note that when ``strict`` is
``False``, the result can not require gradients or be disconnected
from the inputs. Defaults to ``False``.
strict (bool, optional): If ``True``, an error will be raised when we
detect that there exists an input such that all the outputs are
independent of it. If ``False``, we return a Tensor of zeros as the
jacobian for said inputs, which is the expected mathematical value.
Defaults to ``False``.
Returns:
Jacobian (Tensor or nested tuple of Tensors) if there are a single input
and output, this will be a single Tensor containing the Jacobian for the
linearized inputs and output. If one of the two is a tuple, then the Jacobian
will be a tuple of Tensors. If both of them are tuples, then the Jacobian will
be a tuple of tuple of Tensors where ``Jacobian[i][j]`` will contain the Jacobian
of the ``i``th output and ``j``th input and will have as size the concatenation of the
sizes of the corresponding output and the corresponding input.
Jacobian (Tensor or nested tuple of Tensors): if there are a single
input and output, this will be a single Tensor containing the
Jacobian for the linearized inputs and output. If one of the two is
a tuple, then the Jacobian will be a tuple of Tensors. If both of
them are tuples, then the Jacobian will be a tuple of tuple of
Tensors where ``Jacobian[i][j]`` will contain the Jacobian of the
``i``\th output and ``j``\th input and will have as size the
concatenation of the sizes of the corresponding output and the
corresponding input.
Example::
Example:
>>> def exp_reducer(x):
... return x.exp().sum(dim=1)
@ -416,7 +424,9 @@ def jacobian(func, inputs, create_graph=False, strict=False):
inputs = _grad_preprocess(inputs, create_graph=create_graph, need_graph=True)
outputs = func(*inputs)
is_outputs_tuple, outputs = _as_tuple(outputs, "outputs of the user-provided function", "jacobian")
is_outputs_tuple, outputs = _as_tuple(outputs,
"outputs of the user-provided function",
"jacobian")
_check_requires_grad(outputs, "outputs", strict=strict)
jacobian = tuple()
@ -424,18 +434,23 @@ def jacobian(func, inputs, create_graph=False, strict=False):
jac_i = tuple([] for _ in range(len(inputs)))
for j in range(out.nelement()):
vj = _autograd_grad((out.reshape(-1)[j],), inputs, retain_graph=True, create_graph=create_graph)
vj = _autograd_grad((out.reshape(-1)[j],), inputs,
retain_graph=True, create_graph=create_graph)
for el_idx, (jac_i_el, vj_el, inp_el) in enumerate(zip(jac_i, vj, inputs)):
if vj_el is not None:
if strict and create_graph and not vj_el.requires_grad:
raise RuntimeError("The jacobian of the user-provided function is independent of "
"input {}. This is not allowed in strict mode when create_graph=True.".format(i))
msg = ("The jacobian of the user-provided function is "
"independent of input {}. This is not allowed in "
"strict mode when create_graph=True.".format(i))
raise RuntimeError(msg)
jac_i_el.append(vj_el)
else:
if strict:
raise RuntimeError("Output {} of the user-provided function is independent of "
"input {}. This is not allowed in strict mode.".format(i, el_idx))
msg = ("Output {} of the user-provided function is "
"independent of input {}. This is not allowed in "
"strict mode.".format(i, el_idx))
raise RuntimeError(msg)
jac_i_el.append(torch.zeros_like(inp_el))
jacobian += (tuple(torch.stack(jac_i_el, dim=0).view(out.size()
@ -466,11 +481,11 @@ def hessian(func, inputs, create_graph=False, strict=False):
Hessian (Tensor or a tuple of tuple of Tensors) if there are a single input,
this will be a single Tensor containing the Hessian for the input.
If it is a tuple, then the Hessian will be a tuple of tuples where
``Hessian[i][j]`` will contain the Hessian of the ``i``th input
and ``j``th input with size the sum of the size of the ``i``th input plus
the size of the ``j``th input.
``Hessian[i][j]`` will contain the Hessian of the ``i``\th input
and ``j``\th input with size the sum of the size of the ``i``\th input plus
the size of the ``j``\th input.
Example::
Example:
>>> def pow_reducer(x):
... return x.pow(3).sum()
@ -567,9 +582,7 @@ def vhp(func, inputs, v=None, create_graph=False, strict=False):
func_output (tuple of Tensors or Tensor): output of ``func(inputs)``
vhp (tuple of Tensors or Tensor): result of the dot product with the same shape
as the inputs.
Example::
>>> def pow_reducer(x):
... return x.pow(3).sum()
>>> inputs = torch.rand(2, 2)
@ -578,13 +591,10 @@ def vhp(func, inputs, v=None, create_graph=False, strict=False):
(tensor(0.5591),
tensor([[1.0689, 1.2431],
[3.0989, 4.4456]]))
>>> vhp(pow_reducer, inputs, v, create_graph=True)
(tensor(0.5591, grad_fn=<SumBackward0>),
tensor([[1.0689, 1.2431],
[3.0989, 4.4456]], grad_fn=<MulBackward0>))
>>> def pow_adder_reducer(x, y):
... return (2 * x.pow(2) + 3 * y.pow(2)).sum()
>>> inputs = (torch.rand(2), torch.rand(2))
@ -593,7 +603,6 @@ def vhp(func, inputs, v=None, create_graph=False, strict=False):
(tensor(4.8053),
(tensor([0., 0.]),
tensor([6., 6.])))
"""
is_inputs_tuple, inputs = _as_tuple(inputs, "inputs", "vhp")
@ -639,25 +648,26 @@ def hvp(func, inputs, v=None, create_graph=False, strict=False):
func (function): a Python function that takes Tensor inputs and returns
a Tensor with a single element.
inputs (tuple of Tensors or Tensor): inputs to the function ``func``.
v (tuple of Tensors or Tensor): The vector for which the Hessian vector product is computed. Must be the
same size as the input of ``func``. This argument is optional when
``func``'s input contains a single element and (if it is not provided) will be set as a Tensor
containing a single ``1``.
v (tuple of Tensors or Tensor): The vector for which the Hessian vector
product is computed. Must be the same size as the input of
``func``. This argument is optional when ``func``'s input contains
a single element and (if it is not provided) will be set as a
Tensor containing a single ``1``.
create_graph (bool, optional): If ``True``, both the output and result will be
computed in a differentiable way. Note that when ``strict`` is ``False``, the result can not
require gradients or be disconnected from the inputs.
Defaults to ``False``.
strict (bool, optional): If ``True``, an error will be raised when we detect that there exists an input
such that all the outputs are independent of it. If ``False``, we return a Tensor of zeros as the
computed in a differentiable way. Note that when ``strict`` is
``False``, the result can not require gradients or be disconnected
from the inputs. Defaults to ``False``.
strict (bool, optional): If ``True``, an error will be raised when we
detect that there exists an input such that all the outputs are
independent of it. If ``False``, we return a Tensor of zeros as the
hvp for said inputs, which is the expected mathematical value.
Defaults to ``False``.
Returns:
func_output (tuple of Tensors or Tensor): output of ``func(inputs)``
hvp (tuple of Tensors or Tensor): result of the dot product with the same shape
as the inputs.
hvp (tuple of Tensors or Tensor): result of the dot product with
the same shape as the inputs.
Example::
Example:
>>> def pow_reducer(x):
... return x.pow(3).sum()
@ -683,7 +693,7 @@ def hvp(func, inputs, v=None, create_graph=False, strict=False):
(tensor([0., 0.]),
tensor([6., 6.])))
Note::
Note:
This function is significantly slower than `vhp` due to backward mode AD constraints.
If your functions is twice continuously differentiable, then hvp = vhp.t(). So if you

View File

@ -88,58 +88,62 @@ DEFAULT_EXTRA_FILES_MAP = torch._C.ExtraFilesMap()
def save(m, f, _extra_files=DEFAULT_EXTRA_FILES_MAP):
"""
Save an offline version of this module for use in a separate process. The saved
module serializes all of the methods, submodules, parameters, and attributes of this
module. It can be loaded into the C++ API using ``torch::jit::load(filename)`` or into the Python
API with :func:`torch.jit.load <torch.jit.load>`.
r"""
Save an offline version of this module for use in a separate process. The
saved module serializes all of the methods, submodules, parameters, and
attributes of this module. It can be loaded into the C++ API using
``torch::jit::load(filename)`` or into the Python API with
:func:`torch.jit.load <torch.jit.load>`.
To be able to save a module, it must not make any calls to native Python functions.
This means that all submodules must be subclasses of :class:`ScriptModule` as well.
To be able to save a module, it must not make any calls to native Python
functions. This means that all submodules must be subclasses of
:class:`ScriptModule` as well.
.. DANGER::
All modules, no matter their device, are always loaded onto the CPU during loading.
This is different from :func:`torch.load`'s semantics and may change in the future.
.. DANGER::
All modules, no matter their device, are always loaded onto the CPU
during loading. This is different from :func:`torch.load`'s semantics
and may change in the future.
Arguments:
m: A :class:`ScriptModule` to save.
f: A file-like object (has to implement write and flush) or a string
containing a file name.
_extra_files: Map from filename to contents which will be stored as part of 'f'.
Arguments:
m: A :class:`ScriptModule` to save.
f: A file-like object (has to implement write and flush) or a string
containing a file name.
_extra_files: Map from filename to contents which will be stored as part of `f`.
.. warning::
If you are using Python 2, ``torch.jit.save`` does NOT support :any:`StringIO.StringIO`
as a valid file-like object. This is because the write method should return
the number of bytes written; ``StringIO.write()`` does not do this.
.. warning::
If you are using Python 2, `save` does NOT support ``StringIO.StringIO``
as a valid file-like object. This is because the write method should
return the number of bytes written; ``StringIO.write()`` does not do
this.
Please use something like ``io.BytesIO`` instead.
Please use something like ``io.BytesIO`` instead.
Example:
Example:
.. testcode::
.. testcode::
import torch
import io
import torch
import io
class MyModule(torch.nn.Module):
def forward(self, x):
return x + 10
class MyModule(torch.nn.Module):
def forward(self, x):
return x + 10
m = torch.jit.script(MyModule())
m = torch.jit.script(MyModule())
# Save to file
torch.jit.save(m, 'scriptmodule.pt')
# This line is equivalent to the previous
m.save("scriptmodule.pt")
# Save to file
torch.jit.save(m, 'scriptmodule.pt')
# This line is equivalent to the previous
m.save("scriptmodule.pt")
# Save to io.BytesIO buffer
buffer = io.BytesIO()
torch.jit.save(m, buffer)
# Save to io.BytesIO buffer
buffer = io.BytesIO()
torch.jit.save(m, buffer)
# Save with extra files
extra_files = torch._C.ExtraFilesMap()
extra_files['foo.txt'] = 'bar'
torch.jit.save(m, 'scriptmodule.pt', _extra_files=extra_files)
# Save with extra files
extra_files = torch._C.ExtraFilesMap()
extra_files['foo.txt'] = 'bar'
torch.jit.save(m, 'scriptmodule.pt', _extra_files=extra_files)
"""
if isinstance(f, str) or isinstance(f, pathlib.Path):
m.save(f, _extra_files=_extra_files)
@ -149,64 +153,66 @@ def save(m, f, _extra_files=DEFAULT_EXTRA_FILES_MAP):
def load(f, map_location=None, _extra_files=DEFAULT_EXTRA_FILES_MAP):
r"""
Load a :class:`ScriptModule` or :class:`ScriptFunction` previously
saved with :func:`torch.jit.save <torch.jit.save>`
Load a :class:`ScriptModule` or :class:`ScriptFunction` previously
saved with :func:`torch.jit.save <torch.jit.save>`
All previously saved modules, no matter their device, are first loaded onto CPU,
and then are moved to the devices they were saved from. If this fails (e.g. because
the run time system doesn't have certain devices), an exception is raised.
All previously saved modules, no matter their device, are first loaded onto CPU,
and then are moved to the devices they were saved from. If this fails (e.g.
because the run time system doesn't have certain devices), an exception is
raised.
Arguments:
f: a file-like object (has to implement read, readline, tell, and seek),
or a string containing a file name
map_location (string or torch.device): A simplified version of ``map_location`` in
``torch.save`` used to dynamically remap storages to an alternative set of devices.
_extra_files (dictionary of filename to content): The extra
filenames given in the map would be loaded and their content
would be stored in the provided map.
Arguments:
f: a file-like object (has to implement read, readline, tell, and seek),
or a string containing a file name
map_location (string or torch.device): A simplified version of
``map_location`` in `torch.jit.save` used to dynamically remap
storages to an alternative set of devices.
_extra_files (dictionary of filename to content): The extra
filenames given in the map would be loaded and their content
would be stored in the provided map.
Returns:
A :class:`ScriptModule` object.
Returns:
A :class:`ScriptModule` object.
Example:
Example:
.. testcode::
.. testcode::
import torch
import io
import torch
import io
torch.jit.load('scriptmodule.pt')
torch.jit.load('scriptmodule.pt')
# Load ScriptModule from io.BytesIO object
with open('scriptmodule.pt', 'rb') as f:
buffer = io.BytesIO(f.read())
# Load ScriptModule from io.BytesIO object
with open('scriptmodule.pt', 'rb') as f:
buffer = io.BytesIO(f.read())
# Load all tensors to the original device
torch.jit.load(buffer)
# Load all tensors to the original device
torch.jit.load(buffer)
# Load all tensors onto CPU, using a device
buffer.seek(0)
torch.jit.load(buffer, map_location=torch.device('cpu'))
# Load all tensors onto CPU, using a device
buffer.seek(0)
torch.jit.load(buffer, map_location=torch.device('cpu'))
# Load all tensors onto CPU, using a string
buffer.seek(0)
torch.jit.load(buffer, map_location='cpu')
# Load all tensors onto CPU, using a string
buffer.seek(0)
torch.jit.load(buffer, map_location='cpu')
# Load with extra files.
extra_files = torch._C.ExtraFilesMap()
extra_files['foo.txt'] = 'bar'
torch.jit.load('scriptmodule.pt', _extra_files=extra_files)
print(extra_files['foo.txt'])
# Load with extra files.
extra_files = torch._C.ExtraFilesMap()
extra_files['foo.txt'] = 'bar'
torch.jit.load('scriptmodule.pt', _extra_files=extra_files)
print(extra_files['foo.txt'])
.. testoutput::
:hide:
.. testoutput::
:hide:
...
...
.. testcleanup::
.. testcleanup::
import os
os.remove("scriptmodule.pt")
import os
os.remove("scriptmodule.pt")
"""
if isinstance(f, string_classes):
if not os.path.exists(f):
@ -747,88 +753,95 @@ def trace(func,
"""
Trace a function and return an executable or :class:`ScriptFunction`
that will be optimized using just-in-time compilation. Tracing is ideal for
code that operates only on ``Tensor``\\s and lists, dictionaries, and tuples of ``Tensor``\\s.
code that operates only on ``Tensor``\\s and lists, dictionaries, and
tuples of ``Tensor``\\s.
Using ``torch.jit.trace`` and :func:`torch.jit.trace_module<torch.jit.trace_module>`, you can turn an existing module or Python
function into a TorchScript :class:`ScriptFunction` or :class:`ScriptModule`. You must provide example inputs,
and we run the function, recording the operations performed on all the tensors.
Using `torch.jit.trace` and `torch.jit.trace_module`, you can turn an
existing module or Python function into a TorchScript
:class:`ScriptFunction` or :class:`ScriptModule`. You must provide example
inputs, and we run the function, recording the operations performed on all
the tensors.
* The resulting recording of a standalone function produces :class:`ScriptFunction`.
* The resulting recording of ``forward`` function of ``nn.Module`` or ``nn.Module`` produces :class:`ScriptModule`.
* The resulting recording of a standalone function produces `ScriptFunction`.
* The resulting recording of `nn.Module.forward` or `nn.Module` produces
`ScriptModule`.
This module also contains any parameters that the original
module had as well.
.. warning::
Warning:
Tracing only correctly records functions and modules which are not data
dependent (e.g., do not have conditionals on data in tensors) and do not have
any untracked external dependencies (e.g., perform input/output or
access global variables). Tracing only records operations done when the given
function is run on the given
tensors. Therefore, the returned :class:`ScriptModule` will always run the same traced
graph on any input. This has some important implications when your module is
expected to run different sets of operations, depending on the input and/or the
module state. For example,
function is run on the given tensors. Therefore, the returned
`ScriptModule` will always run the same traced graph on any input. This
has some important implications when your module is expected to run
different sets of operations, depending on the input and/or the module
state. For example,
* Tracing will not record any control-flow like if-statements or loops.
When this control-flow is constant across your module, this is fine and it often
inlines the control-flow decisions. But sometimes the control-flow is actually part
of the model itself. For instance, a recurrent network is a loop over
the (possibly dynamic) length of an input sequence.
When this control-flow is constant across your module, this is fine
and it often inlines the control-flow decisions. But sometimes the
control-flow is actually part of the model itself. For instance, a
recurrent network is a loop over the (possibly dynamic) length of an
input sequence.
* In the returned :class:`ScriptModule`, operations that have different
behaviors in ``training`` and ``eval`` modes will always behave as if it
is in the mode it was in during tracing, no matter which mode the
:class:`ScriptModule` is in.
behaviors in ``training`` and ``eval`` modes will always behave as if
it is in the mode it was in during tracing, no matter which mode the
`ScriptModule` is in.
In cases like these, tracing would not be appropriate and :func:`scripting <torch.jit.script>` is a better
choice. If you trace such models, you may silently get
incorrect results on subsequent invocations of the model. The tracer
will try to emit warnings when doing something that may cause an
incorrect trace to be produced.
In cases like these, tracing would not be appropriate and
:func:`scripting <torch.jit.script>` is a better choice. If you trace
such models, you may silently get incorrect results on subsequent
invocations of the model. The tracer will try to emit warnings when
doing something that may cause an incorrect trace to be produced.
Arguments:
func (callable or torch.nn.Module): A Python function or ``torch.nn.Module``
that will be run with ``example_inputs``.
arguments and returns to ``func`` must be tensors
or (possibly nested) tuples that
contain tensors. When a module is passed to
:func:`torch.jit.trace <torch.jit.trace>`, only the
``forward`` method is run and traced
(see :func:`torch.jit.trace <torch.jit.trace_module>` for details).
example_inputs (tuple): A tuple of example inputs that will be passed to the function
while tracing. The resulting trace can be run with
inputs of different types and shapes assuming the traced operations
support those types and shapes. ``example_inputs`` may also be a single
Tensor in which case it is automatically wrapped in a tuple.
func (callable or torch.nn.Module): A Python function or `torch.nn.Module`
that will be run with `example_inputs`. `func` arguments and return
values must be tensors or (possibly nested) tuples that contain
tensors. When a module is passed `torch.jit.trace`, only the
``forward`` method is run and traced (see :func:`torch.jit.trace
<torch.jit.trace_module>` for details).
example_inputs (tuple): A tuple of example inputs that will be passed
to the function while tracing. The resulting trace can be run with
inputs of different types and shapes assuming the traced operations
support those types and shapes. `example_inputs` may also be a
single Tensor in which case it is automatically wrapped in a tuple.
Keyword arguments:
check_trace (bool, optional): Check if the same inputs run through
traced code produce the same outputs. Default: ``True``. You might want
to disable this if, for example, your network contains non-
deterministic ops or if you are sure that the network is correct despite
a checker failure.
traced code produce the same outputs. Default: ``True``. You might want
to disable this if, for example, your network contains non-
deterministic ops or if you are sure that the network is correct despite
a checker failure.
check_inputs (list of tuples, optional): A list of tuples of input arguments that should be used
to check the trace against what is expected. Each tuple
is equivalent to a set of input arguments that would
be specified in ``example_inputs``. For best results, pass in a
set of checking inputs representative of the space of
shapes and types of inputs you expect the network to see.
If not specified, the original ``example_inputs`` are used for checking
check_tolerance (float, optional): Floating-point comparison tolerance to use in the checker procedure.
This can be used to relax the checker strictness in the event that
results diverge numerically for a known reason, such as operator fusion.
strict (bool, optional): run the tracer in a strict mode or not (default: True). Only turn this off when you
want the tracer to record your mutable container types (currently list/dict) and you
are sure that the list/dict that you are using in your problem is a `constant` structure
and does not get used as control flow (if, for) conditions.
check_inputs (list of tuples, optional): A list of tuples of input
arguments that should be used to check the trace against what is
expected. Each tuple is equivalent to a set of input arguments that
would be specified in ``example_inputs``. For best results, pass in
a set of checking inputs representative of the space of shapes and
types of inputs you expect the network to see. If not specified,
the original ``example_inputs`` are used for checking
check_tolerance (float, optional): Floating-point comparison tolerance
to use in the checker procedure. This can be used to relax the
checker strictness in the event that results diverge numerically
for a known reason, such as operator fusion.
strict (bool, optional): run the tracer in a strict mode or not
(default: ``True``). Only turn this off when you want the tracer to
record your mutable container types (currently ``list``/``dict``)
and you are sure that the containuer you are using in your
problem is a ``constant`` structure and does not get used as
control flow (if, for) conditions.
Returns:
If ``callable`` is ``nn.Module`` or ``forward`` of ``nn.Module``, ``trace`` returns
a :class:`ScriptModule` object with a single ``forward`` method containing the traced code.
The returned :class:`ScriptModule` will have the same set of sub-modules and parameters as the
original ``nn.Module``.
If ``callable`` is a standalone function, ``trace`` returns :class:`ScriptFunction`
If `func` is `nn.Module` or ``forward`` of `nn.Module`, `trace` returns
a :class:`ScriptModule` object with a single ``forward`` method
containing the traced code. The returned `ScriptModule` will
have the same set of sub-modules and parameters as the original
``nn.Module``. If ``func`` is a standalone function, ``trace``
returns `ScriptFunction`.
Example (tracing a function):

View File

@ -45,6 +45,7 @@ Unsupported Tensor Methods
"""
methods, properties = _gen_unsupported_methods_properties()
return header + "\n" + methods + """
Unsupported Tensor Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
""" + "\n" + properties

View File

@ -25,7 +25,13 @@ planes.
See :class:`~torch.nn.Conv1d` for details and output shape.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
input: input tensor of shape :math:`(\text{minibatch} , \text{in\_channels} , iW)`
@ -55,7 +61,14 @@ planes.
See :class:`~torch.nn.Conv2d` for details and output shape.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
input: input tensor of shape :math:`(\text{minibatch} , \text{in\_channels} , iH , iW)`
@ -86,7 +99,13 @@ planes.
See :class:`~torch.nn.Conv3d` for details and output shape.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
input: input tensor of shape :math:`(\text{minibatch} , \text{in\_channels} , iT , iH , iW)`
@ -116,7 +135,13 @@ composed of several input planes, sometimes also called "deconvolution".
See :class:`~torch.nn.ConvTranspose1d` for details and output shape.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
input: input tensor of shape :math:`(\text{minibatch} , \text{in\_channels} , iW)`
@ -149,7 +174,13 @@ composed of several input planes, sometimes also called "deconvolution".
See :class:`~torch.nn.ConvTranspose2d` for details and output shape.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
input: input tensor of shape :math:`(\text{minibatch} , \text{in\_channels} , iH , iW)`
@ -184,7 +215,13 @@ composed of several input planes, sometimes also called "deconvolution"
See :class:`~torch.nn.ConvTranspose3d` for details and output shape.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
input: input tensor of shape :math:`(\text{minibatch} , \text{in\_channels} , iT , iH , iW)`
@ -1767,7 +1804,10 @@ def embedding_bag(input, weight, offsets=None, max_norm=None, norm_type=2,
See :class:`torch.nn.EmbeddingBag` for more details.
.. include:: cuda_deterministic_backward.rst
Note:
When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
input (LongTensor): Tensor containing bags of indices into the embedding matrix
@ -2046,8 +2086,18 @@ def ctc_loss(log_probs, targets, input_lengths, target_lengths, blank=0,
See :class:`~torch.nn.CTCLoss` for details.
.. include:: cudnn_deterministic.rst
.. include:: cuda_deterministic_backward.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Note:
When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
log_probs: :math:`(T, N, C)` where `C = number of characters in alphabet including blank`,
@ -2232,7 +2282,8 @@ def poisson_nll_loss(input, target, log_input=True, full=False, size_average=Non
def kl_div(input, target, size_average=None, reduce=None, reduction='mean', log_target=False):
# type: (Tensor, Tensor, Optional[bool], Optional[bool], str, bool) -> Tensor
r"""The `Kullback-Leibler divergence`_ Loss.
r"""The `Kullback-Leibler divergence Loss
<https://en.wikipedia.org/wiki/Kullback-Leibler_divergence>`__
See :class:`~torch.nn.KLDivLoss` for details.
@ -2268,9 +2319,6 @@ def kl_div(input, target, size_average=None, reduce=None, reduction='mean', log_
:attr:``reduction`` = ``'mean'`` doesn't return the true kl divergence value, please use
:attr:``reduction`` = ``'batchmean'`` which aligns with KL math definition.
In the next major release, ``'mean'`` will be changed to be the same as 'batchmean'.
.. _Kullback-Leibler divergence:
https://en.wikipedia.org/wiki/Kullback-Leibler_divergence
"""
if not torch.jit.is_scripting():
tens_ops = (input, target)
@ -2820,7 +2868,10 @@ def upsample(input, size=None, scale_factor=None, mode='nearest', align_corners=
This function is deprecated in favor of :func:`torch.nn.functional.interpolate`.
This is equivalent with ``nn.functional.interpolate(...)``.
.. include:: cuda_deterministic_backward.rst
Note:
When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off.
Please see the notes on :doc:`/notes/randomness` for background.
The algorithm used for upsampling is determined by :attr:`mode`.
@ -3027,7 +3078,10 @@ def interpolate(input, size=None, scale_factor=None, mode='nearest', align_corne
in 1.6.0, and scale_factor will be used in the interpolation
calculation.
.. include:: cuda_deterministic_backward.rst
Note:
When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off.
Please see the notes on :doc:`/notes/randomness` for background.
"""
if not torch.jit.is_scripting():
if type(input) is not Tensor and has_torch_function((input,)):
@ -3131,7 +3185,10 @@ def upsample_nearest(input, size=None, scale_factor=None): # noqa: F811
size.
scale_factor (int): multiplier for spatial size. Has to be an integer.
.. include:: cuda_deterministic_backward.rst
Note:
When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off.
Please see the notes on :doc:`/notes/randomness` for background.
"""
# DeprecationWarning is ignored by default
warnings.warn("nn.functional.upsample_nearest is deprecated. Use nn.functional.interpolate instead.")
@ -3173,7 +3230,10 @@ def upsample_bilinear(input, size=None, scale_factor=None): # noqa: F811
size (int or Tuple[int, int]): output spatial size.
scale_factor (int or Tuple[int, int]): multiplier for spatial size
.. include:: cuda_deterministic_backward.rst
Note:
When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off.
Please see the notes on :doc:`/notes/randomness` for background.
"""
# DeprecationWarning is ignored by default
warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.")
@ -3231,10 +3291,14 @@ def grid_sample(input, grid, mode='bilinear', padding_mode='zeros', align_corner
and becomes ``x' = 1.5``, then reflects by border ``1`` and becomes
``x'' = -0.5``.
.. note::
Note:
This function is often used in conjunction with :func:`affine_grid`
to build `Spatial Transformer Networks`_ .
.. include:: cuda_deterministic_backward.rst
Note:
When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
input (Tensor): input of shape :math:`(N, C, H_\text{in}, W_\text{in})` (4-D case)
@ -3428,7 +3492,10 @@ def _pad(input, pad, mode='constant', value=0):
3D input tensor. Reflect padding is only implemented for padding the last 2
dimensions of 4D input tensor, or the last dimension of 3D input tensor.
.. include:: cuda_deterministic_backward.rst
Note:
When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
input (Tensor): N-dimensional tensor

View File

@ -55,7 +55,6 @@ class ConvReLU3d(nnq.Conv3d):
We adopt the same interface as :class:`torch.nn.quantized.Conv3d`.
.. note::
Attributes: Same as torch.nn.quantized.Conv3d
"""

View File

@ -72,7 +72,7 @@ class ReLU(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/ReLU.png
.. image:: ../scripts/activation_images/ReLU.png
Examples::
@ -184,7 +184,7 @@ class Hardtanh(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/Hardtanh.png
.. image:: ../scripts/activation_images/Hardtanh.png
Examples::
@ -232,7 +232,7 @@ class ReLU6(Hardtanh):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/ReLU6.png
.. image:: ../scripts/activation_images/ReLU6.png
Examples::
@ -261,7 +261,7 @@ class Sigmoid(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/Sigmoid.png
.. image:: ../scripts/activation_images/Sigmoid.png
Examples::
@ -312,7 +312,7 @@ class Tanh(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/Tanh.png
.. image:: ../scripts/activation_images/Tanh.png
Examples::
@ -371,7 +371,7 @@ class ELU(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/ELU.png
.. image:: ../scripts/activation_images/ELU.png
Examples::
@ -411,7 +411,7 @@ class CELU(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/CELU.png
.. image:: ../scripts/activation_images/CELU.png
Examples::
@ -456,7 +456,7 @@ class SELU(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/SELU.png
.. image:: ../scripts/activation_images/SELU.png
Examples::
@ -515,8 +515,8 @@ class GLU(Module):
class GELU(Module):
r"""Applies the Gaussian Error Linear Units function:
.. math::
\text{GELU}(x) = x * \Phi(x)
.. math:: \text{GELU}(x) = x * \Phi(x)
where :math:`\Phi(x)` is the Cumulative Distribution Function for Gaussian Distribution.
Shape:
@ -524,7 +524,7 @@ class GELU(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/GELU.png
.. image:: ../scripts/activation_images/GELU.png
Examples::
@ -555,7 +555,7 @@ class Hardshrink(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/Hardshrink.png
.. image:: ../scripts/activation_images/Hardshrink.png
Examples::
@ -601,7 +601,7 @@ class LeakyReLU(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/LeakyReLU.png
.. image:: ../scripts/activation_images/LeakyReLU.png
Examples::
@ -635,7 +635,7 @@ class LogSigmoid(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/LogSigmoid.png
.. image:: ../scripts/activation_images/LogSigmoid.png
Examples::
@ -669,7 +669,7 @@ class Softplus(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/Softplus.png
.. image:: ../scripts/activation_images/Softplus.png
Examples::
@ -710,7 +710,7 @@ class Softshrink(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/Softshrink.png
.. image:: ../scripts/activation_images/Softshrink.png
Examples::
@ -930,7 +930,7 @@ class PReLU(Module):
Attributes:
weight (Tensor): the learnable weights of shape (:attr:`num_parameters`).
.. image:: scripts/activation_images/PReLU.png
.. image:: ../scripts/activation_images/PReLU.png
Examples::
@ -963,7 +963,7 @@ class Softsign(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/Softsign.png
.. image:: ../scripts/activation_images/Softsign.png
Examples::
@ -987,7 +987,7 @@ class Tanhshrink(Module):
dimensions
- Output: :math:`(N, *)`, same shape as the input
.. image:: scripts/activation_images/Tanhshrink.png
.. image:: ../scripts/activation_images/Tanhshrink.png
Examples::

View File

@ -14,8 +14,9 @@ _ASMoutput = namedtuple('ASMoutput', ['output', 'loss'])
class AdaptiveLogSoftmaxWithLoss(Module):
r"""Efficient softmax approximation as described in
`Efficient softmax approximation for GPUs`_ by Edouard Grave, Armand Joulin,
Moustapha Cissé, David Grangier, and Hervé Jégou.
`Efficient softmax approximation for GPUs by Edouard Grave, Armand Joulin,
Moustapha Cissé, David Grangier, and Hervé Jégou
<https://arxiv.org/abs/1609.04309>`__.
Adaptive softmax is an approximate strategy for training models with large
output spaces. It is most effective when the label distribution is highly
@ -94,12 +95,7 @@ class AdaptiveLogSoftmaxWithLoss(Module):
- output1: :math:`(N)`
- output2: ``Scalar``
.. _Efficient softmax approximation for GPUs:
https://arxiv.org/abs/1609.04309
.. _Zipf's law:
https://en.wikipedia.org/wiki/Zipf%27s_law
.. _Zipf's law: https://en.wikipedia.org/wiki/Zipf%27s_law
"""
def __init__(self, in_features, n_classes, cutoffs, div_value=4., head_bias=False):

View File

@ -109,7 +109,8 @@ class _BatchNorm(_NormBase):
class BatchNorm1d(_BatchNorm):
r"""Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D
inputs with optional additional channel dimension) as described in the paper
`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift`_ .
`Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift <https://arxiv.org/abs/1502.03167>`__ .
.. math::
@ -167,9 +168,6 @@ class BatchNorm1d(_BatchNorm):
>>> m = nn.BatchNorm1d(100, affine=False)
>>> input = torch.randn(20, 100)
>>> output = m(input)
.. _`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift`:
https://arxiv.org/abs/1502.03167
"""
def _check_input_dim(self, input):
@ -181,7 +179,8 @@ class BatchNorm1d(_BatchNorm):
class BatchNorm2d(_BatchNorm):
r"""Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs
with additional channel dimension) as described in the paper
`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift`_ .
`Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift <https://arxiv.org/abs/1502.03167>`__ .
.. math::
@ -239,9 +238,6 @@ class BatchNorm2d(_BatchNorm):
>>> m = nn.BatchNorm2d(100, affine=False)
>>> input = torch.randn(20, 100, 35, 45)
>>> output = m(input)
.. _`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift`:
https://arxiv.org/abs/1502.03167
"""
def _check_input_dim(self, input):
@ -253,7 +249,8 @@ class BatchNorm2d(_BatchNorm):
class BatchNorm3d(_BatchNorm):
r"""Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs
with additional channel dimension) as described in the paper
`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift`_ .
`Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift <https://arxiv.org/abs/1502.03167>`__ .
.. math::
@ -312,9 +309,6 @@ class BatchNorm3d(_BatchNorm):
>>> m = nn.BatchNorm3d(100, affine=False)
>>> input = torch.randn(20, 100, 35, 45, 10)
>>> output = m(input)
.. _`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift`:
https://arxiv.org/abs/1502.03167
"""
def _check_input_dim(self, input):
@ -326,7 +320,8 @@ class BatchNorm3d(_BatchNorm):
class SyncBatchNorm(_BatchNorm):
r"""Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs
with additional channel dimension) as described in the paper
`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift`_ .
`Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift <https://arxiv.org/abs/1502.03167>`__ .
.. math::
@ -404,9 +399,6 @@ class SyncBatchNorm(_BatchNorm):
>>> sync_bn_network,
>>> device_ids=[args.local_rank],
>>> output_device=args.local_rank)
.. _`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift`:
https://arxiv.org/abs/1502.03167
"""
def __init__(self, num_features, eps=1e-5, momentum=0.1, affine=True,

View File

@ -125,14 +125,14 @@ class Conv1d(_ConvNd):
of size
:math:`\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor`.
.. note::
Note:
Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid
`cross-correlation`_, and not a full `cross-correlation`_.
It is up to the user to add proper padding.
.. note::
Note:
When `groups == in_channels` and `out_channels == K * in_channels`,
where `K` is a positive integer, this operation is also termed in
@ -142,7 +142,14 @@ class Conv1d(_ConvNd):
a depthwise convolution with a depthwise multiplier `K`, can be constructed by arguments
:math:`(C_\text{in}=C_{in}, C_\text{out}=C_{in} \times K, ..., \text{groups}=C_{in})`.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
in_channels (int): Number of channels in the input image
@ -151,12 +158,14 @@ class Conv1d(_ConvNd):
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of
the input. Default: 0
padding_mode (string, optional): ``'zeros'``, ``'reflect'``, ``'replicate'`` or ``'circular'``. Default: ``'zeros'``
padding_mode (string, optional): ``'zeros'``, ``'reflect'``,
``'replicate'`` or ``'circular'``. Default: ``'zeros'``
dilation (int or tuple, optional): Spacing between kernel
elements. Default: 1
groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``True``
bias (bool, optional): If ``True``, adds a learnable bias to the
output. Default: ``True``
Shape:
- Input: :math:`(N, C_{in}, L_{in})`
@ -168,7 +177,8 @@ class Conv1d(_ConvNd):
Attributes:
weight (Tensor): the learnable weights of the module of shape
:math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size})`.
:math:`(\text{out\_channels},
\frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size})`.
The values of these weights are sampled from
:math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{in} * \text{kernel\_size}}`
@ -257,14 +267,14 @@ class Conv2d(_ConvNd):
- a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
and the second `int` for the width dimension
.. note::
Note:
Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid `cross-correlation`_,
and not a full `cross-correlation`_.
It is up to the user to add proper padding.
.. note::
Note:
When `groups == in_channels` and `out_channels == K * in_channels`,
where `K` is a positive integer, this operation is also termed in
@ -274,18 +284,29 @@ class Conv2d(_ConvNd):
a depthwise convolution with a depthwise multiplier `K`, can be constructed by arguments
:math:`(in\_channels=C_{in}, out\_channels=C_{in} \times K, ..., groups=C_{in})`.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of the input. Default: 0
padding_mode (string, optional): ``'zeros'``, ``'reflect'``, ``'replicate'`` or ``'circular'``. Default: ``'zeros'``
padding (int or tuple, optional): Zero-padding added to both sides of
the input. Default: 0
padding_mode (string, optional): ``'zeros'``, ``'reflect'``,
``'replicate'`` or ``'circular'``. Default: ``'zeros'``
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``True``
groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
bias (bool, optional): If ``True``, adds a learnable bias to the
output. Default: ``True``
Shape:
- Input: :math:`(N, C_{in}, H_{in}, W_{in})`
@ -301,17 +322,18 @@ class Conv2d(_ConvNd):
Attributes:
weight (Tensor): the learnable weights of the module of shape
:math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},`
:math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
The values of these weights are sampled from
:math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
bias (Tensor): the learnable bias of the module of shape (out_channels). If :attr:`bias` is ``True``,
then the values of these weights are
sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
:math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},`
:math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
The values of these weights are sampled from
:math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
bias (Tensor): the learnable bias of the module of shape
(out_channels). If :attr:`bias` is ``True``,
then the values of these weights are
sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
Examples::
Examples:
>>> # With square kernels and equal stride
>>> m = nn.Conv2d(16, 33, 3, stride=2)
@ -390,14 +412,14 @@ class Conv3d(_ConvNd):
- a ``tuple`` of three ints -- in which case, the first `int` is used for the depth dimension,
the second `int` for the height dimension and the third `int` for the width dimension
.. note::
Note:
Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid `cross-correlation`_,
and not a full `cross-correlation`_.
It is up to the user to add proper padding.
.. note::
Note:
When `groups == in_channels` and `out_channels == K * in_channels`,
where `K` is a positive integer, this operation is also termed in
@ -407,7 +429,14 @@ class Conv3d(_ConvNd):
a depthwise convolution with a depthwise multiplier `K`, can be constructed by arguments
:math:`(in\_channels=C_{in}, out\_channels=C_{in} \times K, ..., groups=C_{in})`.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
in_channels (int): Number of channels in the input image
@ -567,14 +596,14 @@ class ConvTranspose1d(_ConvTransposeNd):
its own set of filters (of size
:math:`\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor`).
.. note::
Note:
Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid `cross-correlation`_,
and not a full `cross-correlation`_.
It is up to the user to add proper padding.
.. note::
Note:
The :attr:`padding` argument effectively adds ``dilation * (kernel_size - 1) - padding``
amount of zero padding to both sizes of the input. This is set so that
when a :class:`~torch.nn.Conv1d` and a :class:`~torch.nn.ConvTranspose1d`
@ -586,7 +615,13 @@ class ConvTranspose1d(_ConvTransposeNd):
that :attr:`output_padding` is only used to find output shape, but does
not actually add zero-padding to output.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
in_channels (int): Number of channels in the input image
@ -692,7 +727,7 @@ class ConvTranspose2d(_ConvTransposeNd):
and not a full `cross-correlation`_.
It is up to the user to add proper padding.
.. note::
Note:
The :attr:`padding` argument effectively adds ``dilation * (kernel_size - 1) - padding``
amount of zero padding to both sizes of the input. This is set so that
when a :class:`~torch.nn.Conv2d` and a :class:`~torch.nn.ConvTranspose2d`
@ -704,7 +739,14 @@ class ConvTranspose2d(_ConvTransposeNd):
that :attr:`output_padding` is only used to find output shape, but does
not actually add zero-padding to output.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
in_channels (int): Number of channels in the input image
@ -834,14 +876,14 @@ class ConvTranspose3d(_ConvTransposeNd):
- a ``tuple`` of three ints -- in which case, the first `int` is used for the depth dimension,
the second `int` for the height dimension and the third `int` for the width dimension
.. note::
Note:
Depending of the size of your kernel, several (of the last)
columns of the input might be lost, because it is a valid `cross-correlation`_,
and not a full `cross-correlation`_.
It is up to the user to add proper padding.
.. note::
Note:
The :attr:`padding` argument effectively adds ``dilation * (kernel_size - 1) - padding``
amount of zero padding to both sizes of the input. This is set so that
when a :class:`~torch.nn.Conv3d` and a :class:`~torch.nn.ConvTranspose3d`
@ -853,7 +895,14 @@ class ConvTranspose3d(_ConvTransposeNd):
that :attr:`output_padding` is only used to find output shape, but does
not actually add zero-padding to output.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
Args:
in_channels (int): Number of channels in the input image

View File

@ -52,7 +52,8 @@ class _InstanceNorm(_NormBase):
class InstanceNorm1d(_InstanceNorm):
r"""Applies Instance Normalization over a 3D input (a mini-batch of 1D
inputs with optional additional channel dimension) as described in the paper
`Instance Normalization: The Missing Ingredient for Fast Stylization`_ .
`Instance Normalization: The Missing Ingredient for Fast Stylization
<https://arxiv.org/abs/1607.08022>`__.
.. math::
@ -112,9 +113,6 @@ class InstanceNorm1d(_InstanceNorm):
>>> m = nn.InstanceNorm1d(100, affine=True)
>>> input = torch.randn(20, 100, 40)
>>> output = m(input)
.. _`Instance Normalization: The Missing Ingredient for Fast Stylization`:
https://arxiv.org/abs/1607.08022
"""
def _check_input_dim(self, input):
@ -133,7 +131,8 @@ class InstanceNorm1d(_InstanceNorm):
class InstanceNorm2d(_InstanceNorm):
r"""Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs
with additional channel dimension) as described in the paper
`Instance Normalization: The Missing Ingredient for Fast Stylization`_ .
`Instance Normalization: The Missing Ingredient for Fast Stylization
<https://arxiv.org/abs/1607.08022>`__.
.. math::
@ -193,9 +192,6 @@ class InstanceNorm2d(_InstanceNorm):
>>> m = nn.InstanceNorm2d(100, affine=True)
>>> input = torch.randn(20, 100, 35, 45)
>>> output = m(input)
.. _`Instance Normalization: The Missing Ingredient for Fast Stylization`:
https://arxiv.org/abs/1607.08022
"""
def _check_input_dim(self, input):
@ -207,7 +203,8 @@ class InstanceNorm2d(_InstanceNorm):
class InstanceNorm3d(_InstanceNorm):
r"""Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs
with additional channel dimension) as described in the paper
`Instance Normalization: The Missing Ingredient for Fast Stylization`_ .
`Instance Normalization: The Missing Ingredient for Fast Stylization
<https://arxiv.org/abs/1607.08022>`__.
.. math::
@ -267,9 +264,6 @@ class InstanceNorm3d(_InstanceNorm):
>>> m = nn.InstanceNorm3d(100, affine=True)
>>> input = torch.randn(20, 100, 35, 45, 10)
>>> output = m(input)
.. _`Instance Normalization: The Missing Ingredient for Fast Stylization`:
https://arxiv.org/abs/1607.08022
"""
def _check_input_dim(self, input):

View File

@ -1291,7 +1291,7 @@ class CTCLoss(_Loss):
Labelling Unsegmented Sequence Data with Recurrent Neural Networks:
https://www.cs.toronto.edu/~graves/icml_2006.pdf
.. Note::
Note:
In order to use CuDNN, the following must be satisfied: :attr:`targets` must be
in concatenated format, all :attr:`input_lengths` must be `T`. :math:`blank=0`,
:attr:`target_lengths` :math:`\leq 256`, the integer arguments must be of
@ -1300,8 +1300,13 @@ class CTCLoss(_Loss):
The regular implementation uses the (more common in PyTorch) `torch.long` dtype.
.. include:: cudnn_deterministic.rst
Note:
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting ``torch.backends.cudnn.deterministic =
True``.
Please see the notes on :doc:`/notes/randomness` for background.
"""
__constants__ = ['blank', 'reduction']

View File

@ -71,7 +71,7 @@ class CrossMapLRN2d(Module):
class LayerNorm(Module):
r"""Applies Layer Normalization over a mini-batch of inputs as described in
the paper `Layer Normalization`_ .
the paper `Layer Normalization <https://arxiv.org/abs/1607.06450>`__
.. math::
y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
@ -123,8 +123,6 @@ class LayerNorm(Module):
>>> m = nn.LayerNorm(10)
>>> # Activating the module
>>> output = m(input)
.. _`Layer Normalization`: https://arxiv.org/abs/1607.06450
"""
__constants__ = ['normalized_shape', 'eps', 'elementwise_affine']
@ -159,7 +157,7 @@ class LayerNorm(Module):
class GroupNorm(Module):
r"""Applies Group Normalization over a mini-batch of inputs as described in
the paper `Group Normalization`_ .
the paper `Group Normalization <https://arxiv.org/abs/1803.08494>`__
.. math::
y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
@ -196,8 +194,6 @@ class GroupNorm(Module):
>>> m = nn.GroupNorm(1, 6)
>>> # Activating the module
>>> output = m(input)
.. _`Group Normalization`: https://arxiv.org/abs/1803.08494
"""
__constants__ = ['num_groups', 'num_channels', 'eps', 'affine']

View File

@ -376,7 +376,7 @@ class RNN(RNNBase):
All the weights and biases are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`
where :math:`k = \frac{1}{\text{hidden\_size}}`
.. include:: cudnn_persistent_rnn.rst
.. include:: ../cudnn_persistent_rnn.rst
Examples::
@ -503,7 +503,7 @@ class LSTM(RNNBase):
All the weights and biases are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`
where :math:`k = \frac{1}{\text{hidden\_size}}`
.. include:: cudnn_persistent_rnn.rst
.. include:: ../cudnn_persistent_rnn.rst
Examples::
@ -680,7 +680,7 @@ class GRU(RNNBase):
All the weights and biases are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`
where :math:`k = \frac{1}{\text{hidden\_size}}`
.. include:: cudnn_persistent_rnn.rst
.. include:: ../cudnn_persistent_rnn.rst
Examples::

View File

@ -10,7 +10,8 @@ import torch.nn.intrinsic as nni
class BatchNorm2d(torch.nn.BatchNorm2d):
r"""Applies Quantized Batch Normalization over a 4D input (a mini-batch of 2D inputs
with additional channel dimension) as described in the paper
`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift`_ .
`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
<https://arxiv.org/abs/1502.03167>`__ .
.. math::
@ -73,7 +74,9 @@ class BatchNorm2d(torch.nn.BatchNorm2d):
class BatchNorm3d(torch.nn.BatchNorm3d):
r"""Applies Quantized Batch Normalization over a 5D input (a mini-batch of 3D inputs
with additional channel dimension) as described in the paper
`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift`_ .
`Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
<https://arxiv.org/abs/1502.03167>`__ .
.. math::
y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
Because the Batch Normalization is done over the `C` dimension, computing statistics

View File

@ -8,7 +8,7 @@ import torch.nn.quantized.functional
class LayerNorm(torch.nn.LayerNorm):
r"""Applies Layer Normalization over a mini-batch of inputs as described in
the paper `Layer Normalization`_ .
the paper `Layer Normalization <https://arxiv.org/abs/1607.06450>`__
.. math::
y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
@ -60,8 +60,6 @@ class LayerNorm(torch.nn.LayerNorm):
>>> m = nn.LayerNorm(10)
>>> # Activating the module
>>> output = m(input)
.. _`Layer Normalization`: https://arxiv.org/abs/1607.06450
"""
def __init__(self, normalized_shape, weight, bias, scale, zero_point, eps=1e-5,

View File

@ -44,9 +44,9 @@ class BasePruningMethod(ABC):
Args:
t (torch.Tensor): tensor representing the parameter to prune
default_mask (torch.Tensor): Base mask from previous pruning
iterations, that need to be respected after the new mask is
applied. Same dims as ``t``.
default_mask (torch.Tensor): Base mask from previous pruning
iterations, that need to be respected after the new mask is
applied. Same dims as ``t``.
Returns:
mask (torch.Tensor): mask to apply to ``t``, of same dims as ``t``
@ -89,7 +89,7 @@ class BasePruningMethod(ABC):
will act.
args: arguments passed on to a subclass of
:class:`BasePruningMethod`
kwargs: keyword arguments passed on to a subclass of a
kwargs: keyword arguments passed on to a subclass of a
:class:`BasePruningMethod`
"""
@ -156,7 +156,7 @@ class BasePruningMethod(ABC):
# pruning
orig = getattr(module, name)
# If this is the first time pruning is applied, take care of moving
# If this is the first time pruning is applied, take care of moving
# the original tensor to a new parameter called name + '_orig' and
# and deleting the original parameter
if not isinstance(method, PruningContainer):
@ -171,7 +171,7 @@ class BasePruningMethod(ABC):
else:
default_mask = getattr(module, name + "_mask").detach().clone(memory_format=torch.contiguous_format)
# Use try/except because if anything goes wrong with the mask
# Use try/except because if anything goes wrong with the mask
# computation etc., you'd want to roll back.
try:
# get the final mask, computed according to the specific method
@ -198,7 +198,7 @@ class BasePruningMethod(ABC):
according to the pruning rule specified in :meth:`compute_mask`.
Args:
t (torch.Tensor): tensor to prune (of same dimensions as
t (torch.Tensor): tensor to prune (of same dimensions as
``default_mask``).
default_mask (torch.Tensor, optional): mask from previous pruning
iteration, if any. To be considered when determining what
@ -218,7 +218,7 @@ class BasePruningMethod(ABC):
named ``name+'_orig'`` is removed from the parameter list. Similarly,
the buffer named ``name+'_mask'`` is removed from the buffers.
Note:
Note:
Pruning itself is NOT undone or reversed!
"""
# before removing pruning from a tensor, it has to have been applied
@ -246,8 +246,8 @@ class PruningContainer(BasePruningMethod):
Keeps track of the order in which pruning methods are applied and handles
combining successive pruning calls.
Accepts as argument an instance of a BasePruningMethod or an iterable of
them.
Accepts as argument an instance of a BasePruningMethod or an iterable of
them.
"""
def __init__(self, *args):
@ -295,13 +295,13 @@ class PruningContainer(BasePruningMethod):
return self._pruning_methods[idx]
def compute_mask(self, t, default_mask):
r"""Applies the latest ``method`` by computing the new partial masks
r"""Applies the latest ``method`` by computing the new partial masks
and returning its combination with the ``default_mask``.
The new partial mask should be computed on the entries or channels
that were not zeroed out by the ``default_mask``.
Which portions of the tensor ``t`` the new mask will be calculated from
that were not zeroed out by the ``default_mask``.
Which portions of the tensor ``t`` the new mask will be calculated from
depends on the ``PRUNING_TYPE`` (handled by the type handler):
* for 'unstructured', the mask will be computed from the raveled
* for 'unstructured', the mask will be computed from the raveled
list of nonmasked entries;
* for 'structured', the mask will be computed from the nonmasked
@ -331,7 +331,7 @@ class PruningContainer(BasePruningMethod):
Returns:
new_mask (torch.Tensor): new mask that combines the effects
of the old mask and the new mask from the current
of the old mask and the new mask from the current
pruning method (of same dimensions as mask and t).
"""
new_mask = mask # start off from existing mask
@ -425,7 +425,7 @@ class RandomUnstructured(BasePruningMethod):
will act.
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
"""
@ -468,7 +468,7 @@ class RandomUnstructured(BasePruningMethod):
will act.
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
"""
return super(RandomUnstructured, cls).apply(
@ -477,13 +477,13 @@ class RandomUnstructured(BasePruningMethod):
class L1Unstructured(BasePruningMethod):
r"""Prune (currently unpruned) units in a tensor by zeroing out the ones
r"""Prune (currently unpruned) units in a tensor by zeroing out the ones
with the lowest L1-norm.
Args:
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
"""
@ -530,7 +530,7 @@ class L1Unstructured(BasePruningMethod):
will act.
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
"""
return super(L1Unstructured, cls).apply(module, name, amount=amount)
@ -542,7 +542,7 @@ class RandomStructured(BasePruningMethod):
Args:
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
dim (int, optional): index of the dim along which we define
channels to prune. Default: -1.
@ -559,14 +559,14 @@ class RandomStructured(BasePruningMethod):
def compute_mask(self, t, default_mask):
r"""Computes and returns a mask for the input tensor ``t``.
Starting from a base ``default_mask`` (which should be a mask of ones
if the tensor has not been pruned yet), generate a random mask to
if the tensor has not been pruned yet), generate a random mask to
apply on top of the ``default_mask`` by randomly zeroing out channels
along the specified dim of the tensor.
Args:
t (torch.Tensor): tensor representing the parameter to prune
default_mask (torch.Tensor): Base mask from previous pruning
iterations, that need to be respected after the new mask is
default_mask (torch.Tensor): Base mask from previous pruning
iterations, that need to be respected after the new mask is
applied. Same dims as ``t``.
Returns:
@ -613,7 +613,7 @@ class RandomStructured(BasePruningMethod):
if nparams_toprune == 0: # k=0 not supported by torch.kthvalue
mask = default_mask
else:
# apply the new structured mask on top of prior (potentially
# apply the new structured mask on top of prior (potentially
# unstructured) mask
mask = make_mask(t, self.dim, tensor_size, nparams_toprune)
mask *= default_mask.to(dtype=mask.dtype)
@ -631,7 +631,7 @@ class RandomStructured(BasePruningMethod):
will act.
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
dim (int, optional): index of the dim along which we define
channels to prune. Default: -1.
@ -648,7 +648,7 @@ class LnStructured(BasePruningMethod):
Args:
amount (int or float): quantity of channels to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
n (int, float, inf, -inf, 'fro', 'nuc'): See documentation of valid
entries for argument ``p`` in :func:`torch.norm`.
@ -674,8 +674,8 @@ class LnStructured(BasePruningMethod):
Args:
t (torch.Tensor): tensor representing the parameter to prune
default_mask (torch.Tensor): Base mask from previous pruning
iterations, that need to be respected after the new mask is
default_mask (torch.Tensor): Base mask from previous pruning
iterations, that need to be respected after the new mask is
applied. Same dims as ``t``.
Returns:
@ -750,7 +750,7 @@ class LnStructured(BasePruningMethod):
will act.
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
n (int, float, inf, -inf, 'fro', 'nuc'): See documentation of valid
entries for argument ``p`` in :func:`torch.norm`.
@ -826,10 +826,10 @@ def random_unstructured(module, name, amount):
by removing the specified ``amount`` of (currently unpruned) units
selected at random.
Modifies module in place (and also return the modified module) by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter `name` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
original (unpruned) parameter is stored in a new parameter named
original (unpruned) parameter is stored in a new parameter named
``name+'_orig'``.
Args:
@ -838,7 +838,7 @@ def random_unstructured(module, name, amount):
will act.
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
Returns:
@ -858,12 +858,12 @@ def l1_unstructured(module, name, amount):
r"""Prunes tensor corresponding to parameter called ``name`` in ``module``
by removing the specified `amount` of (currently unpruned) units with the
lowest L1-norm.
Modifies module in place (and also return the modified module)
Modifies module in place (and also return the modified module)
by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
original (unpruned) parameter is stored in a new parameter named
2) replacing the parameter ``name`` by its pruned version, while the
original (unpruned) parameter is stored in a new parameter named
``name+'_orig'``.
Args:
@ -872,7 +872,7 @@ def l1_unstructured(module, name, amount):
will act.
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
Returns:
@ -891,12 +891,12 @@ def random_structured(module, name, amount, dim):
r"""Prunes tensor corresponding to parameter called ``name`` in ``module``
by removing the specified ``amount`` of (currently unpruned) channels
along the specified ``dim`` selected at random.
Modifies module in place (and also return the modified module)
Modifies module in place (and also return the modified module)
by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
original (unpruned) parameter is stored in a new parameter named
2) replacing the parameter ``name`` by its pruned version, while the
original (unpruned) parameter is stored in a new parameter named
``name+'_orig'``.
Args:
@ -905,7 +905,7 @@ def random_structured(module, name, amount, dim):
will act.
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
dim (int): index of the dim along which we define channels to prune.
@ -928,12 +928,12 @@ def ln_structured(module, name, amount, n, dim):
r"""Prunes tensor corresponding to parameter called ``name`` in ``module``
by removing the specified ``amount`` of (currently unpruned) channels
along the specified ``dim`` with the lowest L``n``-norm.
Modifies module in place (and also return the modified module)
Modifies module in place (and also return the modified module)
by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
original (unpruned) parameter is stored in a new parameter named
original (unpruned) parameter is stored in a new parameter named
``name+'_orig'``.
Args:
@ -942,7 +942,7 @@ def ln_structured(module, name, amount, n, dim):
will act.
amount (int or float): quantity of parameters to prune.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
n (int, float, inf, -inf, 'fro', 'nuc'): See documentation of valid
entries for argument ``p`` in :func:`torch.norm`.
@ -965,10 +965,10 @@ def global_unstructured(parameters, pruning_method, **kwargs):
Globally prunes tensors corresponding to all parameters in ``parameters``
by applying the specified ``pruning_method``.
Modifies modules in place by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
original (unpruned) parameter is stored in a new parameter named
original (unpruned) parameter is stored in a new parameter named
``name+'_orig'``.
Args:
@ -976,22 +976,22 @@ def global_unstructured(parameters, pruning_method, **kwargs):
the model to prune in a global fashion, i.e. by aggregating all
weights prior to deciding which ones to prune. module must be of
type :class:`nn.Module`, and name must be a string.
pruning_method (function): a valid pruning function from this module,
or a custom one implemented by the user that satisfies the
pruning_method (function): a valid pruning function from this module,
or a custom one implemented by the user that satisfies the
implementation guidelines and has ``PRUNING_TYPE='unstructured'``.
kwargs: other keyword arguments such as:
amount (int or float): quantity of parameters to prune across the
amount (int or float): quantity of parameters to prune across the
specified parameters.
If ``float``, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If ``int``, it represents the
fraction of parameters to prune. If ``int``, it represents the
absolute number of parameters to prune.
Raises:
TypeError: if ``PRUNING_TYPE != 'unstructured'``
Note:
Since global structured pruning doesn't make much sense unless the
norm is normalized by the size of the parameter, we now limit the
Since global structured pruning doesn't make much sense unless the
norm is normalized by the size of the parameter, we now limit the
scope of global pruning to unstructured methods.
Examples:
@ -1068,12 +1068,12 @@ def global_unstructured(parameters, pruning_method, **kwargs):
def custom_from_mask(module, name, mask):
r"""Prunes tensor corresponding to parameter called ``name`` in ``module``
by applying the pre-computed mask in ``mask``.
Modifies module in place (and also return the modified module)
Modifies module in place (and also return the modified module)
by:
1) adding a named buffer called ``name+'_mask'`` corresponding to the
1) adding a named buffer called ``name+'_mask'`` corresponding to the
binary mask applied to the parameter ``name`` by the pruning method.
2) replacing the parameter ``name`` by its pruned version, while the
original (unpruned) parameter is stored in a new parameter named
original (unpruned) parameter is stored in a new parameter named
``name+'_orig'``.
Args:
@ -1083,7 +1083,7 @@ def custom_from_mask(module, name, mask):
mask (Tensor): binary mask to be applied to the parameter.
Returns:
module (nn.Module): modified (i.e. pruned) version of the input module
module (nn.Module): modified (i.e. pruned) version of the input module
Examples:
>>> m = prune.custom_from_mask(
@ -1104,7 +1104,7 @@ def remove(module, name):
named ``name+'_orig'`` is removed from the parameter list. Similarly,
the buffer named ``name+'_mask'`` is removed from the buffers.
Note:
Note:
Pruning itself is NOT undone or reversed!
Args:
@ -1160,12 +1160,12 @@ def _validate_pruning_amount_init(amount):
Args:
amount (int or float): quantity of parameters to prune.
If float, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If int, it represents the
fraction of parameters to prune. If int, it represents the
absolute number of parameters to prune.
Raises:
ValueError: if amount is a float not in [0, 1], or if it's a negative
integer.
integer.
TypeError: if amount is neither a float nor an integer.
Note:
@ -1196,7 +1196,7 @@ def _validate_pruning_amount(amount, tensor_size):
Args:
amount (int or float): quantity of parameters to prune.
If float, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If int, it represents the
fraction of parameters to prune. If int, it represents the
absolute number of parameters to prune.
tensor_size (int): absolute number of parameters in the tensor
to prune.
@ -1232,7 +1232,7 @@ def _validate_structured_pruning(t):
def _compute_nparams_toprune(amount, tensor_size):
r"""Since amount can be expressed either in absolute value or as a
r"""Since amount can be expressed either in absolute value or as a
percentage of the number of units/channels in a tensor, this utility
function converts the percentage to absolute value to standardize
the handling of pruning.
@ -1240,7 +1240,7 @@ def _compute_nparams_toprune(amount, tensor_size):
Args:
amount (int or float): quantity of parameters to prune.
If float, should be between 0.0 and 1.0 and represent the
fraction of parameters to prune. If int, it represents the
fraction of parameters to prune. If int, it represents the
absolute number of parameters to prune.
tensor_size (int): absolute number of parameters in the tensor
to prune.
@ -1268,10 +1268,10 @@ def _validate_pruning_dim(t, dim):
def _compute_norm(t, n, dim):
r"""Compute the L_n-norm across all entries in tensor `t` along all dimension
r"""Compute the L_n-norm across all entries in tensor `t` along all dimension
except for the one identified by dim.
Example: if `t` is of shape, say, 3x2x4 and dim=2 (the last dim),
then norm will have Size [4], and each entry will represent the
then norm will have Size [4], and each entry will represent the
`L_n`-norm computed using the 3x2=6 entries for each of the 4 channels.
Args:

View File

@ -192,6 +192,8 @@ def select_model_mode_for_export(model, mode):
A context manager to temporarily set the training mode of 'model'
to 'mode', resetting it when we exit the with-block. A no-op if
mode is None.
In version 1.6 changed to this from set_training
"""
from torch.onnx import utils

View File

@ -15,7 +15,7 @@ def _with_args(cls_or_self, **kwargs):
This can be useful when there is a need to create classes with the same
constructor arguments, but different instances.
.. Example::
Example::
>>> Foo.with_args = classmethod(_with_args)
>>> foo_builder = Foo.with_args(a=3, b=4).with_args(answer=42)