[Doc] update max-autotune for CPU (#134986)

The current doc for `max-autotune` is applicable only for GPU. This PR adds the corresponding content for CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134986
Approved by: https://github.com/jgong5, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
This commit is contained in:
Wu, Chunyuan
2024-09-07 03:56:27 -07:00
committed by PyTorch MergeBot
parent f7c0c06692
commit 18479c5f70

View File

@ -2378,8 +2378,9 @@ def compile(
There are other circumstances where CUDA graphs are not applicable; use TORCH_LOG=perf_hints
to debug.
- "max-autotune" is a mode that leverages Triton based matrix multiplications and convolutions
It enables CUDA graphs by default.
- "max-autotune" is a mode that leverages Triton or template based matrix multiplications
on supported devices and Triton based convolutions on GPU.
It enables CUDA graphs by default on GPU.
- "max-autotune-no-cudagraphs" is a mode similar to "max-autotune" but without CUDA graphs