Rework torch.compile docs (#96706)

Chatted with @stas00 on slack and here are some great improvements he suggested to the compile docs

- [x] Rename `dynamo` folder to `compile`
- [x] Link `compile` docstring on `torch.html` to main index page for compile
- [x] Create a new index page that describes why people should care
  - [x] easy perf, memory reduction, 1 line
  - [x] Short benchmark table
  - [x] How to guide
  - [x] TOC that links to the more technical pages folks have written, make the existing docs we have a Technical overview
- [x] Highlight the new APIs for `torch._inductor.list_options()` and `torch._inductor.list_mode_options()` - clarify these are inductor specific and add more prose around which ones are most interesting

He also highlighted an interesting way to think about who is reading this doc we have

- [x] End users, that just want things to run fast
- [x] Library maintainers wrapping torch.compile which would care for example about understanding when in their code they should compile a model, which backends are supported
- [x] Debuggers who needs are somewhat addressed by the troubleshooting guide and faq but those could be dramatically reworked to say what we expect to break

And in a seperate PR I'll work on the below with @SherlockNoMad
- [ ] Authors of new backends that care about how to plug into dynamo or inductor layer so need to explain some more internals like
  - [ ] IR
  - [ ] Where to plugin, dynamo? inductor? triton?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96706
Approved by: https://github.com/svekars
This commit is contained in:
Mark Saroufim
2023-03-15 04:41:09 +00:00
committed by PyTorch MergeBot
parent 2795233668
commit 6110effa86
11 changed files with 103 additions and 93 deletions

View File

@ -1,7 +1,8 @@
Frequently Asked Questions
==========================
**Author**: `Mark Saroufim <https://github.com/msaroufim>`_
At a high level, the TorchDynamo stack consists of a graph capture from
At a high level, the PyTorch 2.0 stack consists of a graph capture from
Python code using dynamo and a backend compiler. In this example the
backend compiler consists of backward graph tracing using AOTAutograd
and graph lowering using TorchInductor. There are of course many more
@ -369,6 +370,9 @@ compilers can be different in subtle ways yet have dramatic impact on
your training stability. So the accuracy debugger is very useful for us
to detect bugs in our codegen or with a backend compiler.
If you'd like to ensure that random number generation is the same across both torch
and triton then you can enable ``torch._inductor.config.fallback_random = True``
Why am I getting OOMs?
----------------------

View File

@ -4,6 +4,8 @@ Getting Started
Lets start with a simple example. Note that you are likely to see more
significant speedups the newer your GPU is.
The below is a tutorial for inference, for a training specific tutorial, make sure to checkout `example on training <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`__
.. code:: python
import torch

View File

@ -0,0 +1,80 @@
.. currentmodule:: torch
torch.compile
====================
:func:`~torch.compile` was introduced in `PyTorch 2.0 <https://pytorch.org/get-started/pytorch-2.0/>`__
Our default and supported backend is `inductor` with benchmarks `showing 30% to 2x speedups and 10% memory compression <https://github.com/pytorch/pytorch/issues/93794>`__
on real world models for both training and inference with a single line of code.
.. note::
The :func:`~torch.compile` API is experimental and subject to change.
The simplest possible interesting program is the below which we go over in a lot more detail in `getting started <https://pytorch.org/docs/master/compile/get-started.html>`__
showing how to use :func:`~torch.compile` to speed up inference on a variety of real world models from both TIMM and HuggingFace which we
co-announced `here <https://pytorch.org/blog/Accelerating-Hugging-Face-and-TIMM-models/>`__
.. code:: python
import torch
def fn(x):
x = torch.cos(x).cuda()
x = torch.sin(x).cuda()
return x
compiled_fn = torch.compile(fn(torch.randn(10).cuda()))
If you happen to be running your model on an Ampere GPU, it's crucial to enable tensor cores. We will actually warn you to set
``torch.set_float32_matmul_precision('high')``
:func:`~torch.compile` works over :class:`~torch.nn.Module` as well as functions so you can pass in your entire training loop.
The above example was for inference but you can follow this tutorial for an `example on training <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`__
Optimizations
-------------
Optimizations can be passed in :func:`~torch.compile` with either a backend mode parameter or as passes. To understand what are the available options you can run
``torch._inductor.list_options`` and ``torch._inductor.list_mode_options()``
The default backend is `inductor` which will likely be the most reliable and performant option for most users and library maintainers,
other backends are there for power users who don't mind more experimental community support.
You can get the full list of community backends by running :func:`~torch._dynamo.list_backends`
.. autosummary::
:toctree: generated
:nosignatures:
compile
Troubleshooting and Gotchas
---------------------------
IF you experience issues with models failing to compile, running of out of memory, recompiling too often, not giving accurate results,
odds are you will find the right tool to solve your problem in our guides.
.. WARNING::
A few features are still very much in development and not likely to work for most users. Please do not use these features
in production code and if you're a library maintainer please do not expose these options to your users
Dynamic shapes ``dynamic=true`` and max autotune ``mode="max-autotune"`` which can be passed in to :func:`~torch.compile`.
Distributed training has some quirks which you can follow in the troubleshooting guide below. Model export is not ready yet.
.. toctree::
:maxdepth: 1
troubleshooting
faq
Learn more
----------
If you can't wait to get started and want to learn more about the internals of the PyTorch 2.0 stack then
please check out the references below.
.. toctree::
:maxdepth: 1
get-started
technical-overview

View File

@ -1,4 +1,4 @@
TorchDynamo Overview
Technical Overview
====================
**TorchDynamo** is a Python-level JIT compiler designed to make unmodified
@ -34,12 +34,7 @@ dev-discuss <https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-co
.. toctree::
:maxdepth: 1
:hidden:
installation
get-started
guards-overview
custom-backends
deep-dive
troubleshooting
faq

View File

@ -1,9 +1,9 @@
TorchDynamo Troubleshooting
PyTorch 2.0 Troubleshooting
===========================
**Author**: `Michael Lazos <https://github.com/mlazos>`_
TorchDynamo is still in active development, and many of the reasons for
`torch.compile` is still in active development, and many of the reasons for
graph breaks and excessive recompilation will be fixed with upcoming
support for `tracing dynamic tensor
shapes <https://docs.google.com/document/d/1QJB-GOnbv-9PygGlOMXwiO9K6vVNm8sNg_olixJ9koc/edit?usp=sharing>`__,
@ -707,6 +707,9 @@ compilers can be different in subtle ways yet have dramatic impact on
your training stability. So the accuracy debugger is very useful for us
to detect bugs in our codegen or with a backend compiler.
If you'd like to ensure that random number generation is the same across both torch
and triton then you can enable ``torch._inductor.config.fallback_random = True``
File an Issue
~~~~~~~~~~~~~

View File

@ -1,75 +0,0 @@
Installing TorchDynamo
======================
This section describes how to install TorchDynamo.
TorchDynamo is included in the nightly binaries of PyTorch. For
more information, see `Getting Started <https://pytorch.org/get-started/locally/>`__.
Requirements
------------
You must have the following prerequisites to use TorchDynamo:
* A Linux or macOS environment
* Python 3.8 (recommended). Python 3.7 through 3.10 are supported and
tested. Make sure to have a development version of Python installed
locally as well.
GPU/CUDA Requirements
~~~~~~~~~~~~~~~~~~~~~
To use GPU back ends, and in particular Triton, make sure that
the CUDA that you have installed locally matches the PyTorch version you
are running.
The following command installs GPU PyTorch + TorchDynamo along with GPU
TorchDynamo dependencies (for CUDA 11.7):
.. code-block:: shell
pip3 install numpy --pre torch --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
CPU requirements
~~~~~~~~~~~~~~~~
There are no additional requirements for CPU TorchDynamo. CPU
TorchDynamo is included in the nightly versions of PyTorch.
To install, run the following command:
.. code-block:: shell
pip3 install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
Verify Installation
~~~~~~~~~~~~~~~~~~~
If you built PyTorch from source, then you can run the following
commands (from the PyTorch repo root directory)
to check that TorchDynamo is installed correctly:
.. code-block:: shell
cd tools/dynamo
python verify_dynamo.py
If you do not have the PyTorch source locally, you can alternatively
copy the script (``tools/dynamo/verify_dynamo.py``) from the PyTorch
repository and run it locally.
Docker Installation
-------------------
We also provide all the required dependencies in the PyTorch nightly
binaries which you can download with the following command:
.. code-block::
docker pull ghcr.io/pytorch/pytorch-nightly
And for ad hoc experiments just make sure that your container has access
to all your GPUs:
.. code-block:: bash
docker run --gpus all -it ghcr.io/pytorch/pytorch-nightly:latest /bin/bash

View File

@ -46,16 +46,15 @@ Features described in this documentation are classified by release status:
:glob:
:maxdepth: 1
:caption: torch.compile
:hidden:
dynamo/index
dynamo/installation
dynamo/get-started
dynamo/guards-overview
dynamo/custom-backends
dynamo/deep-dive
dynamo/troubleshooting
dynamo/faq
compile/index
compile/get-started
compile/troubleshooting
compile/faq
compile/technical-overview
compile/guards-overview
compile/custom-backends
compile/deep-dive
ir
.. toctree::

View File

@ -714,6 +714,8 @@ Optimizations
compile
`torch.compile documentation <https://pytorch.org/docs/master/compile/index.html>`__
Operator Tags
------------------------------------
.. autoclass:: Tag