From 2f53d570febc6afe69473f0e4989ff59342edb57 Mon Sep 17 00:00:00 2001 From: CaoE Date: Fri, 13 Sep 2024 09:11:47 +0000 Subject: [PATCH] Update document for autocast on CPU (#135299) Update document for autocast on CPU due to the support of float16 and changes in the operator list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135299 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/svekars --- docs/source/amp.rst | 39 ++++++++++++++++++++++-------- docs/source/notes/amp_examples.rst | 2 +- 2 files changed, 30 insertions(+), 11 deletions(-) diff --git a/docs/source/amp.rst b/docs/source/amp.rst index 7192a47fb24b..8698742a9367 100644 --- a/docs/source/amp.rst +++ b/docs/source/amp.rst @@ -95,6 +95,11 @@ updates the parameters, so the scale factor does not interfere with the learning .. currentmodule:: torch.cuda.amp +.. autoclass:: GradScaler + :members: + +.. currentmodule:: torch.cpu.amp + .. autoclass:: GradScaler :members: @@ -365,7 +370,7 @@ in which unlisted ops run if they're downstream from autocasted ops. If an op is unlisted, we assume it's numerically stable in ``bfloat16``. If you believe an unlisted op is numerically unstable in ``bfloat16``, -please file an issue. +please file an issue. ``float16`` shares the lists of ``bfloat16``. CPU Ops that can autocast to ``bfloat16`` """"""""""""""""""""""""""""""""""""""""" @@ -375,19 +380,25 @@ CPU Ops that can autocast to ``bfloat16`` ``conv3d``, ``bmm``, ``mm``, +``linalg_vecdot``, ``baddbmm``, ``addmm``, ``addbmm``, ``linear``, ``matmul``, -``_convolution`` +``_convolution``, +``conv_tbc``, +``mkldnn_rnn_layer``, +``conv_transpose1d``, +``conv_transpose2d``, +``conv_transpose3d``, +``prelu``, +``scaled_dot_product_attention``, +``_native_multi_head_attention`` CPU Ops that can autocast to ``float32`` """""""""""""""""""""""""""""""""""""""" -``conv_transpose1d``, -``conv_transpose2d``, -``conv_transpose3d``, ``avg_pool3d``, ``binary_cross_entropy``, ``grid_sampler``, @@ -421,9 +432,22 @@ CPU Ops that can autocast to ``float32`` ``replication_pad2d``, ``replication_pad3d``, ``mse_loss``, +``cosine_embedding_loss``, +``nll_loss``, +``nll_loss2d``, +``hinge_embedding_loss``, +``poisson_nll_loss``, +``cross_entropy_loss``, +``l1_loss``, +``huber_loss``, +``margin_ranking_loss``, +``soft_margin_loss``, +``triplet_margin_loss``, +``multi_margin_loss``, ``ctc_loss``, ``kl_div``, ``multilabel_margin_loss``, +``binary_cross_entropy_with_logits``, ``fft_fft``, ``fft_ifft``, ``fft_fft2``, @@ -438,7 +462,6 @@ CPU Ops that can autocast to ``float32`` ``fft_irfftn``, ``fft_hfft``, ``fft_ihfft``, -``linalg_matrix_norm``, ``linalg_cond``, ``linalg_matrix_rank``, ``linalg_solve``, @@ -451,14 +474,10 @@ CPU Ops that can autocast to ``float32`` ``linalg_tensorinv``, ``linalg_tensorsolve``, ``fake_quantize_per_tensor_affine``, -``eig``, ``geqrf``, -``lstsq``, ``_lu_with_info``, ``qr``, -``solve``, ``svd``, -``symeig``, ``triangular_solve``, ``fractional_max_pool2d``, ``fractional_max_pool3d``, diff --git a/docs/source/notes/amp_examples.rst b/docs/source/notes/amp_examples.rst index f95f99b7ac2f..1ae63c4396cc 100644 --- a/docs/source/notes/amp_examples.rst +++ b/docs/source/notes/amp_examples.rst @@ -9,7 +9,7 @@ Ordinarily, "automatic mixed precision training" means training with :class:`torch.autocast` and :class:`torch.amp.GradScaler` together. Instances of :class:`torch.autocast` enable autocasting for chosen regions. -Autocasting automatically chooses the precision for GPU operations to improve performance +Autocasting automatically chooses the precision for operations to improve performance while maintaining accuracy. Instances of :class:`torch.amp.GradScaler` help perform the steps of