Rework torch.compile docs (#96706)

Chatted with @stas00 on slack and here are some great improvements he suggested to the compile docs - [x] Rename `dynamo` folder to `compile` - [x] Link `compile` docstring on `torch.html` to main index page for compile - [x] Create a new index page that describes why people should care - [x] easy perf, memory reduction, 1 line - [x] Short benchmark table - [x] How to guide - [x] TOC that links to the more technical pages folks have written, make the existing docs we have a Technical overview - [x] Highlight the new APIs for `torch._inductor.list_options()` and `torch._inductor.list_mode_options()` - clarify these are inductor specific and add more prose around which ones are most interesting He also highlighted an interesting way to think about who is reading this doc we have - [x] End users, that just want things to run fast - [x] Library maintainers wrapping torch.compile which would care for example about understanding when in their code they should compile a model, which backends are supported - [x] Debuggers who needs are somewhat addressed by the troubleshooting guide and faq but those could be dramatically reworked to say what we expect to break And in a seperate PR I'll work on the below with @SherlockNoMad - [ ] Authors of new backends that care about how to plug into dynamo or inductor layer so need to explain some more internals like - [ ] IR - [ ] Where to plugin, dynamo? inductor? triton? Pull Request resolved: https://github.com/pytorch/pytorch/pull/96706 Approved by: https://github.com/svekars
2025-10-20 12:54:11 +08:00 · 2023-03-15 04:41:09 +00:00
parent 2795233668
commit 6110effa86
11 changed files with 103 additions and 93 deletions
--- a/docs/source/compile/custom-backends.rst
+++ b/docs/source/compile/custom-backends.rst
--- a/docs/source/compile/deep-dive.rst
+++ b/docs/source/compile/deep-dive.rst
--- a/docs/source/compile/faq.rst
+++ b/docs/source/compile/faq.rst
@ -1,7 +1,8 @@
 Frequently Asked Questions
 ==========================
+**Author**: `Mark Saroufim <https://github.com/msaroufim>`_

-At a high level, the TorchDynamo stack consists of a graph capture from
+At a high level, the PyTorch 2.0 stack consists of a graph capture from
 Python code using dynamo and a backend compiler. In this example the
 backend compiler consists of backward graph tracing using AOTAutograd
 and graph lowering using TorchInductor. There are of course many more
@ -369,6 +370,9 @@ compilers can be different in subtle ways yet have dramatic impact on
 your training stability. So the accuracy debugger is very useful for us
 to detect bugs in our codegen or with a backend compiler.

+If you'd like to ensure that random number generation is the same across both torch
+and triton then you can enable ``torch._inductor.config.fallback_random = True``
+
 Why am I getting OOMs?
 ----------------------

--- a/docs/source/compile/get-started.rst
+++ b/docs/source/compile/get-started.rst
@ -4,6 +4,8 @@ Getting Started
 Let’s start with a simple example. Note that you are likely to see more
 significant speedups the newer your GPU is.

+The below is a tutorial for inference, for a training specific tutorial, make sure to checkout `example on training <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`__
+
 .. code:: python

   import torch
--- a/docs/source/compile/guards-overview.rst
+++ b/docs/source/compile/guards-overview.rst
--- a/docs/source/compile/index.rst
+++ b/docs/source/compile/index.rst
@ -0,0 +1,80 @@
+.. currentmodule:: torch
+
+torch.compile
+====================
+
+:func:`~torch.compile` was introduced in `PyTorch 2.0 <https://pytorch.org/get-started/pytorch-2.0/>`__
+
+Our default and supported backend is `inductor` with benchmarks `showing 30% to 2x speedups and 10% memory compression <https://github.com/pytorch/pytorch/issues/93794>`__
+on real world models for both training and inference with a single line of code.
+
+.. note::
+    The :func:`~torch.compile` API is experimental and subject to change.
+
+The simplest possible interesting program is the below which we go over in a lot more detail in `getting started <https://pytorch.org/docs/master/compile/get-started.html>`__
+showing how to use :func:`~torch.compile` to speed up inference on a variety of real world models from both TIMM and HuggingFace which we
+co-announced `here <https://pytorch.org/blog/Accelerating-Hugging-Face-and-TIMM-models/>`__
+
+.. code:: python
+
+   import torch
+   def fn(x):
+       x = torch.cos(x).cuda()
+       x = torch.sin(x).cuda()
+       return x
+   compiled_fn = torch.compile(fn(torch.randn(10).cuda()))
+
+If you happen to be running your model on an Ampere GPU, it's crucial to enable tensor cores. We will actually warn you to set
+``torch.set_float32_matmul_precision('high')``
+
+:func:`~torch.compile` works over :class:`~torch.nn.Module` as well as functions so you can pass in your entire training loop.
+
+The above example was for inference but you can follow this tutorial for an `example on training <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`__
+
+
+Optimizations
+-------------
+
+Optimizations can be passed in :func:`~torch.compile` with either a backend mode parameter or as passes. To understand what are the available options you can run
+``torch._inductor.list_options`` and ``torch._inductor.list_mode_options()``
+
+The default backend is `inductor` which will likely be the most reliable and performant option for most users and library maintainers,
+other backends are there for power users who don't mind more experimental community support.
+
+You can get the full list of community backends by running :func:`~torch._dynamo.list_backends`
+
+.. autosummary::
+    :toctree: generated
+    :nosignatures:
+
+    compile
+
+Troubleshooting and Gotchas
+---------------------------
+
+IF you experience issues with models failing to compile, running of out of memory, recompiling too often, not giving accurate results,
+odds are you will find the right tool to solve your problem in our guides.
+
+.. WARNING::
+    A few features are still very much in development and not likely to work for most users. Please do not use these features
+    in production code and if you're a library maintainer please do not expose these options to your users
+    Dynamic shapes ``dynamic=true`` and max autotune ``mode="max-autotune"`` which can be passed in to :func:`~torch.compile`.
+    Distributed training has some quirks which you can follow in the troubleshooting guide below. Model export is not ready yet.
+
+.. toctree::
+   :maxdepth: 1
+
+   troubleshooting
+   faq
+
+Learn more
+----------
+
+If you can't wait to get started and want to learn more about the internals of the PyTorch 2.0 stack then
+please check out the references below.
+
+.. toctree::
+   :maxdepth: 1
+
+   get-started
+   technical-overview
--- a/docs/source/compile/technical-overview.rst
+++ b/docs/source/compile/technical-overview.rst
@ -1,4 +1,4 @@
-TorchDynamo Overview
+Technical Overview
 ====================

 **TorchDynamo** is a Python-level JIT compiler designed to make unmodified
@ -34,12 +34,7 @@ dev-discuss <https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-co

 .. toctree::
   :maxdepth: 1
-   :hidden:

-   installation
-   get-started
   guards-overview
   custom-backends
   deep-dive
-   troubleshooting
-   faq
--- a/docs/source/compile/troubleshooting.rst
+++ b/docs/source/compile/troubleshooting.rst
@ -1,9 +1,9 @@
-TorchDynamo Troubleshooting
+PyTorch 2.0 Troubleshooting
 ===========================

 **Author**: `Michael Lazos <https://github.com/mlazos>`_

-TorchDynamo is still in active development, and many of the reasons for
+`torch.compile` is still in active development, and many of the reasons for
 graph breaks and excessive recompilation will be fixed with upcoming
 support for `tracing dynamic tensor
 shapes <https://docs.google.com/document/d/1QJB-GOnbv-9PygGlOMXwiO9K6vVNm8sNg_olixJ9koc/edit?usp=sharing>`__,
@ -707,6 +707,9 @@ compilers can be different in subtle ways yet have dramatic impact on
 your training stability. So the accuracy debugger is very useful for us
 to detect bugs in our codegen or with a backend compiler.

+If you'd like to ensure that random number generation is the same across both torch
+and triton then you can enable ``torch._inductor.config.fallback_random = True``
+
 File an Issue
 ~~~~~~~~~~~~~

--- a/docs/source/dynamo/installation.rst
+++ b/docs/source/dynamo/installation.rst
@ -1,75 +0,0 @@
-Installing TorchDynamo
-======================
-
-This section describes how to install TorchDynamo.
-TorchDynamo is included in the nightly binaries of PyTorch. For
-more information, see `Getting Started <https://pytorch.org/get-started/locally/>`__.
-
-Requirements
------------
-
-You must have the following prerequisites to use TorchDynamo:
-
-* A Linux or macOS environment
-* Python 3.8 (recommended). Python 3.7 through 3.10 are supported and
-  tested. Make sure to have a development version of Python installed
-  locally as well.
-
-GPU/CUDA Requirements
-~~~~~~~~~~~~~~~~~~~~~
-
-To use GPU back ends, and in particular Triton, make sure that
-the CUDA that you have installed locally matches the PyTorch version you
-are running.
-
-The following command installs GPU PyTorch + TorchDynamo along with GPU
-TorchDynamo dependencies (for CUDA 11.7):
-
-.. code-block:: shell
-
-   pip3 install numpy --pre torch --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
-
-CPU requirements
-~~~~~~~~~~~~~~~~
-
-There are no additional requirements for CPU TorchDynamo. CPU
-TorchDynamo is included in the nightly versions of PyTorch.
-To install, run the following command:
-
-.. code-block:: shell
-
-   pip3 install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
-
-
-Verify Installation
-~~~~~~~~~~~~~~~~~~~
-
-If you built PyTorch from source, then you can run the following
-commands (from the PyTorch repo root directory)
-to check that TorchDynamo is installed correctly:
-
-.. code-block:: shell
-
-   cd tools/dynamo
-   python verify_dynamo.py
-
-If you do not have the PyTorch source locally, you can alternatively
-copy the script (``tools/dynamo/verify_dynamo.py``) from the PyTorch
-repository and run it locally.
-
-Docker Installation
-------------------
-
-We also provide all the required dependencies in the PyTorch nightly
-binaries which you can download with the following command:
-
-.. code-block::
-
-   docker pull ghcr.io/pytorch/pytorch-nightly
-
-And for ad hoc experiments just make sure that your container has access
-to all your GPUs:
-
-.. code-block:: bash
-
-   docker run --gpus all -it ghcr.io/pytorch/pytorch-nightly:latest /bin/bash
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -46,16 +46,15 @@ Features described in this documentation are classified by release status:
   :glob:
   :maxdepth: 1
   :caption: torch.compile
-   :hidden:

-   dynamo/index
-   dynamo/installation
-   dynamo/get-started
-   dynamo/guards-overview
-   dynamo/custom-backends
-   dynamo/deep-dive
-   dynamo/troubleshooting
-   dynamo/faq
+   compile/index
+   compile/get-started
+   compile/troubleshooting
+   compile/faq
+   compile/technical-overview
+   compile/guards-overview
+   compile/custom-backends
+   compile/deep-dive
   ir

 .. toctree::
--- a/docs/source/torch.rst
+++ b/docs/source/torch.rst
@ -714,6 +714,8 @@ Optimizations

    compile

+`torch.compile documentation <https://pytorch.org/docs/master/compile/index.html>`__
+
 Operator Tags
 ------------------------------------
 .. autoclass:: Tag