pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Xinya Zhang 98efc9e93d [ROCm] Bump AOTriton to 0.11b (#161754 )

Notable new features/optimizations for SDPA operators on AMD systems from AOTriton 0.11b:

* Invoke AITER Assembly kernels on gfx942/gfx950 when inputs meet requirements
- AITER ASM kernels deliver over 500TFLOPS training performance. See
[AOTriton 0.11b Release Page](https://github.com/ROCm/aotriton/releases/tag/0.11b) for more
details.
* Now returns natural based `logsumexp` tensor, matching CUDA's behavior
- PR #156903 is reverted in this PR as well since it is not needed anymore.
* Enables `CausalVariant.LOWER_RIGHT`

The build system changes drastically along with new packaging scheme of
AOTriton 0.11

* AOTriton 0.11 packs GPU images separately from AOTriton runtime
* `aotriton.cmake` now selectively downloads image packs according to
`PYTORCH_ROCM_ARCH`
* `aotriton.cmake` now only use pre-compiled runtime library that exactly
matches the ROCM in the build environment. For PyTorch builds with ROCm
versions not listed in the file, the build process will build AOTriton
runtime without GPU images from source
- This avoids any further ABI breaks like ROCM 6.4 -> 7.0
- recursive git clone is disabled since building AOTriton runtime does not
require submodules.

Bug fixes:

* Fix a kernel bug introduced when implementing SWA

Known Problems:

* gfx1100 target (Radeon RX 7000 Series) is moved back to experimental status
due to accuracy issues. Triton compiler fixes are needed to restore the
support status.
* Enabling TF32 tests affects accuracy for later non-TF32 tests on ROCM 7.0.
This issue is under investigation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161754
Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily

2025-09-03 20:45:44 +00:00

aotriton.cmake

[ROCm] Bump AOTriton to 0.11b (#161754 )

2025-09-03 20:45:44 +00:00

EigenBLAS.cmake

Build EigenBlas as static library (#44747 )

2020-09-16 10:25:26 -07:00

nccl.cmake

Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073 )

2025-02-19 03:52:26 +00:00

nnpack.cmake