Update document for autocast on CPU (#135299)

Update document for autocast on CPU due to the support of float16 and changes in the operator list.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135299
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/svekars
This commit is contained in:
CaoE
2024-09-13 09:11:47 +00:00
committed by PyTorch MergeBot
parent 31007cf200
commit 2f53d570fe
2 changed files with 30 additions and 11 deletions

View File

@ -95,6 +95,11 @@ updates the parameters, so the scale factor does not interfere with the learning
.. currentmodule:: torch.cuda.amp
.. autoclass:: GradScaler
:members:
.. currentmodule:: torch.cpu.amp
.. autoclass:: GradScaler
:members:
@ -365,7 +370,7 @@ in which unlisted ops run if they're downstream from autocasted ops.
If an op is unlisted, we assume it's numerically stable in ``bfloat16``.
If you believe an unlisted op is numerically unstable in ``bfloat16``,
please file an issue.
please file an issue. ``float16`` shares the lists of ``bfloat16``.
CPU Ops that can autocast to ``bfloat16``
"""""""""""""""""""""""""""""""""""""""""
@ -375,19 +380,25 @@ CPU Ops that can autocast to ``bfloat16``
``conv3d``,
``bmm``,
``mm``,
``linalg_vecdot``,
``baddbmm``,
``addmm``,
``addbmm``,
``linear``,
``matmul``,
``_convolution``
``_convolution``,
``conv_tbc``,
``mkldnn_rnn_layer``,
``conv_transpose1d``,
``conv_transpose2d``,
``conv_transpose3d``,
``prelu``,
``scaled_dot_product_attention``,
``_native_multi_head_attention``
CPU Ops that can autocast to ``float32``
""""""""""""""""""""""""""""""""""""""""
``conv_transpose1d``,
``conv_transpose2d``,
``conv_transpose3d``,
``avg_pool3d``,
``binary_cross_entropy``,
``grid_sampler``,
@ -421,9 +432,22 @@ CPU Ops that can autocast to ``float32``
``replication_pad2d``,
``replication_pad3d``,
``mse_loss``,
``cosine_embedding_loss``,
``nll_loss``,
``nll_loss2d``,
``hinge_embedding_loss``,
``poisson_nll_loss``,
``cross_entropy_loss``,
``l1_loss``,
``huber_loss``,
``margin_ranking_loss``,
``soft_margin_loss``,
``triplet_margin_loss``,
``multi_margin_loss``,
``ctc_loss``,
``kl_div``,
``multilabel_margin_loss``,
``binary_cross_entropy_with_logits``,
``fft_fft``,
``fft_ifft``,
``fft_fft2``,
@ -438,7 +462,6 @@ CPU Ops that can autocast to ``float32``
``fft_irfftn``,
``fft_hfft``,
``fft_ihfft``,
``linalg_matrix_norm``,
``linalg_cond``,
``linalg_matrix_rank``,
``linalg_solve``,
@ -451,14 +474,10 @@ CPU Ops that can autocast to ``float32``
``linalg_tensorinv``,
``linalg_tensorsolve``,
``fake_quantize_per_tensor_affine``,
``eig``,
``geqrf``,
``lstsq``,
``_lu_with_info``,
``qr``,
``solve``,
``svd``,
``symeig``,
``triangular_solve``,
``fractional_max_pool2d``,
``fractional_max_pool3d``,

View File

@ -9,7 +9,7 @@ Ordinarily, "automatic mixed precision training" means training with
:class:`torch.autocast` and :class:`torch.amp.GradScaler` together.
Instances of :class:`torch.autocast` enable autocasting for chosen regions.
Autocasting automatically chooses the precision for GPU operations to improve performance
Autocasting automatically chooses the precision for operations to improve performance
while maintaining accuracy.
Instances of :class:`torch.amp.GradScaler` help perform the steps of