revamp dtype documentation for 2025 (#156087)

The dtype documentation has not been updated in awhile, let's do a revamp. 1. combine the duplicated docs for dtypes from `tensors.rst` and `tensor_attributes.rst` to live in `tensor_attributes.rst`, and link to that page from `tensors.rst` 2. split the dtype table into floating point and integer dtypes 3. add the definition of shell dtype 4. add the float8 and MX dtypes as shell dtypes to the dtype table 5. remove legacy quantized dtypes from the table 6. add the definition of various dtype suffixes ("fn", etc) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156087 Approved by: https://github.com/albanD
2025-10-20 21:14:14 +08:00 · 2025-06-27 13:10:23 +00:00
parent 43523bf168
commit 414ad47045
2 changed files with 60 additions and 102 deletions
--- a/docs/source/tensor_attributes.rst
+++ b/docs/source/tensor_attributes.rst
@ -17,30 +17,65 @@ torch.dtype
 A :class:`torch.dtype` is an object that represents the data type of a
 :class:`torch.Tensor`. PyTorch has several different data types:

-========================== ===========================================   ===========================
-Data type                  dtype                                         Legacy Constructors
-========================== ===========================================   ===========================
-32-bit floating point      ``torch.float32`` or ``torch.float``          ``torch.*.FloatTensor``
-64-bit floating point      ``torch.float64`` or ``torch.double``         ``torch.*.DoubleTensor``
-32-bit complex             ``torch.complex32`` or ``torch.chalf``
-64-bit complex             ``torch.complex64`` or ``torch.cfloat``
-128-bit complex            ``torch.complex128`` or ``torch.cdouble``
-16-bit floating point [1]_ ``torch.float16`` or ``torch.half``           ``torch.*.HalfTensor``
-16-bit floating point [2]_ ``torch.bfloat16``                            ``torch.*.BFloat16Tensor``
-8-bit integer (unsigned)   ``torch.uint8``                               ``torch.*.ByteTensor``
-8-bit integer (signed)     ``torch.int8``                                ``torch.*.CharTensor``
-16-bit integer (signed)    ``torch.int16`` or ``torch.short``            ``torch.*.ShortTensor``
-32-bit integer (signed)    ``torch.int32`` or ``torch.int``              ``torch.*.IntTensor``
-64-bit integer (signed)    ``torch.int64`` or ``torch.long``             ``torch.*.LongTensor``
-Boolean                    ``torch.bool``                                ``torch.*.BoolTensor``
-========================== ===========================================   ===========================
+**Floating point dtypes**

-.. [1] Sometimes referred to as binary16: uses 1 sign, 5 exponent, and 10
-  significand bits. Useful when precision is important.
+=========================================  ===============================
+dtype                                      description
+=========================================  ===============================
+``torch.float32`` or ``torch.float``       32-bit floating point, as defined in https://en.wikipedia.org/wiki/IEEE_754
+``torch.float64`` or ``torch.double``      64-bit floating point, as defined in https://en.wikipedia.org/wiki/IEEE_754
+``torch.float16`` or ``torch.half``        16-bit floating point, as defined in https://en.wikipedia.org/wiki/IEEE_754, S-E-M 1-5-10
+``torch.bfloat16``                         16-bit floating point, sometimes referred to as Brain floating point, S-E-M 1-8-7
+``torch.complex32`` or ``torch.chalf``     32-bit complex with two `float16` components
+``torch.complex64`` or ``torch.cfloat``    64-bit complex with two `float32` components
+``torch.complex128`` or ``torch.cdouble``  128-bit complex with two `float64` components
+``torch.float8_e4m3fn`` [shell]_, [1]_     8-bit floating point, S-E-M 1-4-3, from https://arxiv.org/abs/2209.05433
+``torch.float8_e5m2`` [shell]_             8-bit floating point, S-E-M 1-5-2, from https://arxiv.org/abs/2209.05433
+``torch.float8_e4m3fnuz`` [shell]_, [1]_   8-bit floating point, S-E-M 1-4-3, from https://arxiv.org/pdf/2206.02915
+``torch.float8_e5m2fnuz`` [shell]_, [1]_   8-bit floating point, S-E-M 1-5-2, from https://arxiv.org/pdf/2206.02915
+``torch.float8_e8m0fnu`` [shell]_, [1]_    8-bit floating point, S-E-M 0-8-0, from https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
+``torch.float4_e2m1fn_x2`` [shell]_, [1]_  packed 4-bit floating point, S-E-M 1-2-1, from https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
+=========================================  ===============================

-.. [2] Sometimes referred to as Brain Floating Point: use 1 sign, 8 exponent and 7
-  significand bits. Useful when range is important, since it has the same
-  number of exponent bits as ``float32``
+**Integer dtypes**
+
+=========================================  ===============================
+dtype                                      description
+=========================================  ===============================
+``torch.uint8``                            8-bit integer (unsigned)
+``torch.int8``                             8-bit integer (signed)
+``torch.uint16`` [shell]_, [2]_            16-bit integer (unsigned)
+``torch.int16`` or ``torch.short``         16-bit integer (signed)
+``torch.uint32`` [shell]_, [2]_            32-bit integer (unsigned)
+``torch.int32`` or ``torch.int``           32-bit integer (signed)
+``torch.uint64`` [shell]_, [2]_            64-bit integer (unsigned)
+``torch.int64`` or ``torch.long``          64-bit integer (signed)
+``torch.bool``                             Boolean
+=========================================  ===============================
+
+.. [shell] a shell dtype a specialized dtype with limited op and backend support.
+  Specifically, ops that support tensor creation (``torch.empty``, ``torch.fill``, ``torch.zeros``)
+  and operations which do not peek inside the data elements (``torch.cat``, ``torch.view``, ``torch.reshape``)
+  are supported.  Ops that peek inside the data elements such as casting,
+  matrix multiplication, nan/inf checks are supported only on a case by
+  case basis, depending on maturity and presence of hardware accelerated kernels
+  and established use cases.
+
+.. [1] The "fn", "fnu" and "fnuz" dtype suffixes mean:
+  "f" - finite value encodings only, no infinity;
+  "n" - nan value encodings differ from the IEEE spec;
+  "uz" - "unsigned zero" only, i.e. no negative zero encoding
+
+.. [2]
+  Unsigned types asides from ``uint8`` are currently planned to only have
+  limited support in eager mode (they primarily exist to assist usage with
+  torch.compile); if you need eager support and the extra range is not needed,
+  we recommend using their signed variants instead.  See
+  https://github.com/pytorch/pytorch/issues/58734 for more details.
+
+**Note**: legacy constructors such as ``torch.*.FloatTensor``, ``torch.*.DoubleTensor``, ``torch.*.HalfTensor``,
+``torch.*.BFloat16Tensor``, ``torch.*.ByteTensor``, ``torch.*.CharTensor``, ``torch.*.ShortTensor``, ``torch.*.IntTensor``,
+``torch.*.LongTensor``, ``torch.*.BoolTensor`` only remain for backwards compatibility and should no longer be used.

 To find out if a :class:`torch.dtype` is a floating point data type, the property :attr:`is_floating_point`
 can be used, which returns ``True`` if the data type is a floating point data type.
@ -64,8 +99,8 @@ by finding the minimum dtype that satisfies the following rules:

 A floating point scalar operand has dtype `torch.get_default_dtype()` and an integral
 non-boolean scalar operand has dtype `torch.int64`. Unlike numpy, we do not inspect
-values when determining the minimum `dtypes` of an operand.  Quantized and complex types
-are not yet supported.
+values when determining the minimum `dtypes` of an operand.  Complex types
+are not yet supported. Promotion for shell dtypes is not defined.

 Promotion Examples::

--- a/docs/source/tensors.rst
+++ b/docs/source/tensors.rst
@ -6,84 +6,7 @@ torch.Tensor
 ===================================

 A :class:`torch.Tensor` is a multi-dimensional matrix containing elements of
-a single data type.
-
-
-Data types
----------
-
-Torch defines tensor types with the following data types:
-
-======================================= ===========================================
-Data type                               dtype
-======================================= ===========================================
-32-bit floating point                   ``torch.float32`` or ``torch.float``
-64-bit floating point                   ``torch.float64`` or ``torch.double``
-16-bit floating point [1]_              ``torch.float16`` or ``torch.half``
-16-bit floating point [2]_              ``torch.bfloat16``
-32-bit complex                          ``torch.complex32`` or ``torch.chalf``
-64-bit complex                          ``torch.complex64`` or ``torch.cfloat``
-128-bit complex                         ``torch.complex128`` or ``torch.cdouble``
-8-bit integer (unsigned)                ``torch.uint8``
-16-bit integer (unsigned)               ``torch.uint16`` (limited support) [4]_
-32-bit integer (unsigned)               ``torch.uint32`` (limited support) [4]_
-64-bit integer (unsigned)               ``torch.uint64`` (limited support) [4]_
-8-bit integer (signed)                  ``torch.int8``
-16-bit integer (signed)                 ``torch.int16`` or ``torch.short``
-32-bit integer (signed)                 ``torch.int32`` or ``torch.int``
-64-bit integer (signed)                 ``torch.int64`` or ``torch.long``
-Boolean                                 ``torch.bool``
-quantized 8-bit integer (unsigned)      ``torch.quint8``
-quantized 8-bit integer (signed)        ``torch.qint8``
-quantized 32-bit integer (signed)       ``torch.qint32``
-quantized 4-bit integer (unsigned) [3]_ ``torch.quint4x2``
-8-bit floating point, e4m3 [5]_         ``torch.float8_e4m3fn`` (limited support)
-8-bit floating point, e5m2 [5]_         ``torch.float8_e5m2`` (limited support)
-======================================= ===========================================
-
-.. [1]
-  Sometimes referred to as binary16: uses 1 sign, 5 exponent, and 10
-  significand bits. Useful when precision is important at the expense of range.
-.. [2]
-  Sometimes referred to as Brain Floating Point: uses 1 sign, 8 exponent, and 7
-  significand bits. Useful when range is important, since it has the same
-  number of exponent bits as ``float32``
-.. [3]
-  quantized 4-bit integer is stored as a 8-bit signed integer. Currently it's only supported in EmbeddingBag operator.
-.. [4]
-  Unsigned types asides from ``uint8`` are currently planned to only have
-  limited support in eager mode (they primarily exist to assist usage with
-  torch.compile); if you need eager support and the extra range is not needed,
-  we recommend using their signed variants instead.  See
-  https://github.com/pytorch/pytorch/issues/58734 for more details.
-.. [5]
-  ``torch.float8_e4m3fn`` and ``torch.float8_e5m2`` implement the spec for 8-bit
-  floating point types from https://arxiv.org/abs/2209.05433. The op support
-  is very limited.
-
-
-For backwards compatibility, we support the following alternate class names
-for these data types:
-
-======================================= ============================= ================================
-Data type                               CPU tensor                    GPU tensor
-======================================= ============================= ================================
-32-bit floating point                   :class:`torch.FloatTensor`    :class:`torch.cuda.FloatTensor`
-64-bit floating point                   :class:`torch.DoubleTensor`   :class:`torch.cuda.DoubleTensor`
-16-bit floating point                   :class:`torch.HalfTensor`     :class:`torch.cuda.HalfTensor`
-16-bit floating point                   :class:`torch.BFloat16Tensor` :class:`torch.cuda.BFloat16Tensor`
-8-bit integer (unsigned)                :class:`torch.ByteTensor`     :class:`torch.cuda.ByteTensor`
-8-bit integer (signed)                  :class:`torch.CharTensor`     :class:`torch.cuda.CharTensor`
-16-bit integer (signed)                 :class:`torch.ShortTensor`    :class:`torch.cuda.ShortTensor`
-32-bit integer (signed)                 :class:`torch.IntTensor`      :class:`torch.cuda.IntTensor`
-64-bit integer (signed)                 :class:`torch.LongTensor`     :class:`torch.cuda.LongTensor`
-Boolean                                 :class:`torch.BoolTensor`     :class:`torch.cuda.BoolTensor`
-======================================= ============================= ================================
-
-However, to construct tensors, we recommend using factory functions such as
-:func:`torch.empty` with the ``dtype`` argument instead.  The
-:class:`torch.Tensor` constructor is an alias for the default tensor type
-(:class:`torch.FloatTensor`).
+a single data type. Please see :ref:`dtype-doc` for more details about dtype support.

 Initializing and basic operations
 ---------------------------------