From 2f53d570febc6afe69473f0e4989ff59342edb57 Mon Sep 17 00:00:00 2001
From: CaoE <e.cao@intel.com>
Date: Fri, 13 Sep 2024 09:11:47 +0000
Subject: [PATCH] Update document for autocast on CPU (#135299)

Update document for autocast on CPU due to the support of float16 and changes in the operator list.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135299
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/svekars
---
 docs/source/amp.rst                | 39 ++++++++++++++++++++++--------
 docs/source/notes/amp_examples.rst |  2 +-
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/docs/source/amp.rst b/docs/source/amp.rst
index 7192a47fb24b..8698742a9367 100644
--- a/docs/source/amp.rst
+++ b/docs/source/amp.rst
@@ -95,6 +95,11 @@ updates the parameters, so the scale factor does not interfere with the learning
 
 .. currentmodule:: torch.cuda.amp
 
+.. autoclass:: GradScaler
+    :members:
+
+.. currentmodule:: torch.cpu.amp
+
 .. autoclass:: GradScaler
     :members:
 
@@ -365,7 +370,7 @@ in which unlisted ops run if they're downstream from autocasted ops.
 
 If an op is unlisted, we assume it's numerically stable in ``bfloat16``.
 If you believe an unlisted op is numerically unstable in ``bfloat16``,
-please file an issue.
+please file an issue. ``float16`` shares the lists of ``bfloat16``.
 
 CPU Ops that can autocast to ``bfloat16``
 """""""""""""""""""""""""""""""""""""""""
@@ -375,19 +380,25 @@ CPU Ops that can autocast to ``bfloat16``
 ``conv3d``,
 ``bmm``,
 ``mm``,
+``linalg_vecdot``,
 ``baddbmm``,
 ``addmm``,
 ``addbmm``,
 ``linear``,
 ``matmul``,
-``_convolution``
+``_convolution``,
+``conv_tbc``,
+``mkldnn_rnn_layer``,
+``conv_transpose1d``,
+``conv_transpose2d``,
+``conv_transpose3d``,
+``prelu``,
+``scaled_dot_product_attention``,
+``_native_multi_head_attention``
 
 CPU Ops that can autocast to ``float32``
 """"""""""""""""""""""""""""""""""""""""
 
-``conv_transpose1d``,
-``conv_transpose2d``,
-``conv_transpose3d``,
 ``avg_pool3d``,
 ``binary_cross_entropy``,
 ``grid_sampler``,
@@ -421,9 +432,22 @@ CPU Ops that can autocast to ``float32``
 ``replication_pad2d``,
 ``replication_pad3d``,
 ``mse_loss``,
+``cosine_embedding_loss``,
+``nll_loss``,
+``nll_loss2d``,
+``hinge_embedding_loss``,
+``poisson_nll_loss``,
+``cross_entropy_loss``,
+``l1_loss``,
+``huber_loss``,
+``margin_ranking_loss``,
+``soft_margin_loss``,
+``triplet_margin_loss``,
+``multi_margin_loss``,
 ``ctc_loss``,
 ``kl_div``,
 ``multilabel_margin_loss``,
+``binary_cross_entropy_with_logits``,
 ``fft_fft``,
 ``fft_ifft``,
 ``fft_fft2``,
@@ -438,7 +462,6 @@ CPU Ops that can autocast to ``float32``
 ``fft_irfftn``,
 ``fft_hfft``,
 ``fft_ihfft``,
-``linalg_matrix_norm``,
 ``linalg_cond``,
 ``linalg_matrix_rank``,
 ``linalg_solve``,
@@ -451,14 +474,10 @@ CPU Ops that can autocast to ``float32``
 ``linalg_tensorinv``,
 ``linalg_tensorsolve``,
 ``fake_quantize_per_tensor_affine``,
-``eig``,
 ``geqrf``,
-``lstsq``,
 ``_lu_with_info``,
 ``qr``,
-``solve``,
 ``svd``,
-``symeig``,
 ``triangular_solve``,
 ``fractional_max_pool2d``,
 ``fractional_max_pool3d``,
diff --git a/docs/source/notes/amp_examples.rst b/docs/source/notes/amp_examples.rst
index f95f99b7ac2f..1ae63c4396cc 100644
--- a/docs/source/notes/amp_examples.rst
+++ b/docs/source/notes/amp_examples.rst
@@ -9,7 +9,7 @@ Ordinarily, "automatic mixed precision training" means training with
 :class:`torch.autocast` and :class:`torch.amp.GradScaler` together.
 
 Instances of :class:`torch.autocast` enable autocasting for chosen regions.
-Autocasting automatically chooses the precision for GPU operations to improve performance
+Autocasting automatically chooses the precision for operations to improve performance
 while maintaining accuracy.
 
 Instances of :class:`torch.amp.GradScaler` help perform the steps of