mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Current torch.compile docs have become a bit of a mess with the docs expanded in the left nav. This PR moves them under the torch.compiler menu item in the left nav. A bunch of rewrites were made in collaboration with @msaroufim to address formatting issues, latest updates that moved some of the APIs to the public torch.compiler namespace were addressed as well. The documentation is broken down in three categories that address three main audiences: PyTorch users, Pytorch Developers and PyTorch backend vendors. While, the user-facing documentation was significantly rewritten, dev docs and vendor docs kept mostly untouched. This can be addressed in the follow up PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105376 Approved by: https://github.com/msaroufim
53 lines
3.5 KiB
ReStructuredText
53 lines
3.5 KiB
ReStructuredText
PyTorch 2.0 Performance Dashboard
|
|
=================================
|
|
|
|
**Author:** `Bin Bao <https://github.com/desertfire>`__ and `Huy Do <https://github.com/huydhn>`__
|
|
|
|
PyTorch 2.0's performance is tracked nightly on this `dashboard <https://hud.pytorch.org/benchmark/compilers>`__.
|
|
The performance collection runs on 12 GCP A100 nodes every night. Each node contains a 40GB A100 Nvidia GPU and
|
|
a 6-core 2.2GHz Intel Xeon CPU. The corresponding CI workflow file can be found
|
|
`here <https://github.com/pytorch/pytorch/blob/main/.github/workflows/inductor-perf-test-nightly.yml>`__.
|
|
|
|
How to read the dashboard?
|
|
---------------------------
|
|
|
|
The landing page shows tables for all three benchmark suites we measure, ``TorchBench``, ``Huggingface``, and ``TIMM``,
|
|
and graphs for one benchmark suite with the default setting. For example, the default graphs currently show the AMP
|
|
training performance trend in the past 7 days for ``TorchBench``. Droplists on the top of that page can be
|
|
selected to view tables and graphs with different options. In addition to the pass rate, there are 3 key
|
|
performance metrics reported there: ``Geometric mean speedup``, ``Mean compilation time``, and
|
|
``Peak memory footprint compression ratio``.
|
|
Both ``Geometric mean speedup`` and ``Peak memory footprint compression ratio`` are compared against
|
|
the PyTorch eager performance, and the larger the better. Each individual performance number on those tables can be clicked,
|
|
which will bring you to a view with detailed numbers for all the tests in that specific benchmark suite.
|
|
|
|
What is measured on the dashboard?
|
|
-----------------------------------
|
|
|
|
All the dashboard tests are defined in this
|
|
`function <https://github.com/pytorch/pytorch/blob/3e18d3958be3dfcc36d3ef3c481f064f98ebeaf6/.ci/pytorch/test.sh#L305>`__.
|
|
The exact test configurations are subject to change, but at the moment, we measure both inference and training
|
|
performance with AMP precision on the three benchmark suites. We also measure different settings of TorchInductor,
|
|
including ``default``, ``with_cudagraphs (default + cudagraphs)``, and ``dynamic (default + dynamic_shapes)``.
|
|
|
|
Can I check if my PR affects TorchInductor's performance on the dashboard before merging?
|
|
-----------------------------------------------------------------------------------------
|
|
|
|
Individual dashboard runs can be triggered manually by clicking the ``Run workflow`` button
|
|
`here <https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml>`__
|
|
and submitting with your PR's branch selected. This will kick off a whole dashboard run with your PR's changes.
|
|
Once it is done, you can check the results by selecting the corresponding branch name and commit ID
|
|
on the performance dashboard UI. Be aware that this is an expensive CI run. With the limited
|
|
resources, please use this functionality wisely.
|
|
|
|
How can I run any performance test locally?
|
|
--------------------------------------------
|
|
|
|
The exact command lines used during a complete dashboard run can be found in any recent CI run logs.
|
|
The `workflow page <https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml>`__
|
|
is a good place to look for logs from some of the recent runs.
|
|
In those logs, you can search for lines like
|
|
``python benchmarks/dynamo/huggingface.py --performance --cold-start-latency --inference --amp --backend inductor --disable-cudagraphs --device cuda``
|
|
and run them locally if you have a GPU working with PyTorch 2.0.
|
|
``python benchmarks/dynamo/huggingface.py -h`` will give you a detailed explanation on options of the benchmarking script.
|