Commit Graph

719 Commits

Author SHA1 Message Date
202f83dc4e [ROCm][layer_norm] Use __builtin_amdgcn_rcpf(x) instead of 1.f/x (#165589)
Replace (more) exact calculation with hardware approximation.

Benefits:
Reduced code size.
Improved performance for certain scenarios.

Experiments show low reduction in precision.
Experiments show no significant performance regressions. bfloat16 as well as float16 related calculations may benefit largely from this change.

Co-author: @mhalk @amd-hhashemi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165589
Approved by: https://github.com/jeffdaily
2025-10-17 09:12:30 +00:00
2d50678dcc Fix -Wno-duplicate-decl-specifier is valid for C/ObjC but not for C++ (#164552)
Fixes #99715
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164552
Approved by: https://github.com/Skylion007
2025-10-03 20:12:49 +00:00
22b1710252 Use posix_fallocate() to reserve disk space for shared memory (#161910)
Shared memory is allocated by creating a file in /dev/shm (by default) that can run out of space. Pytorch reserves the file size by calling ftruncate() that creates a sparse file, so it succeeds even if sufficient disk space is not available.

This could lead to a situation when a shared memory region is successfully created but a subsequent access to a shared memory page results in SIGBUS due to the disk being full.

Using posix_fallocate() instead of ftruncate() eliminates this problem because the former syscall always allocates space and it returns an error if the disk is full.

Related to https://github.com/pytorch/pytorch/issues/5040
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161910
Approved by: https://github.com/mikaylagawarecki
2025-10-02 19:12:57 +00:00
089f9130ed Install fmtlib headers. (#164139)
`fmtlib` version was updated to 12.0.0 in #163441.

In this new version, due to https://github.com/fmtlib/fmt/pull/4536, PyTorch started not installing `fmtlib` headers anymore. Because of that, PyTorch/XLA build CI started to fail https://github.com/pytorch/xla/issues/9653. While we did fix it internally https://github.com/pytorch/xla/pull/9650, I believe that PyTorch should continue installing the `fmtlib` headers, since it is a dependency of its C API [`python_arg_parser.h`][1].

PyTorch/XLA CI was moved to `unstable.yml` in #159272, and later removed in #163564. This PyTorch/XLA build failure went under the radar, since the `fmtlib` update only landed on September 22.

[1]: 84d673ef57/torch/csrc/utils/python_arg_parser.h (L42)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164139
Approved by: https://github.com/Skylion007, https://github.com/malfet
2025-09-30 01:10:13 +00:00
00059db034 Revert "[RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)"
This reverts commit 09cb34c1dce8fe1b880bbf3115d8ddad3401d871.

Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/malfet due to reverted internally and now can be safely reverted in OSS ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3334176367))
2025-09-25 13:47:46 +00:00
0bca77951d [Code Clean] Remove deadcodes about Python3.9 [2/N] (#163627)
As the title stated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163627
Approved by: https://github.com/jansel
ghstack dependencies: #163626
2025-09-24 07:30:50 +00:00
09cb34c1dc [RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)
Summary:
Original: D81957844 and D81957923

Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
2025-09-22 21:12:18 +00:00
f0078941cf Revert "[RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)"
This reverts commit 6c334885d48725197b5d35e2c1543efc0f4198d0.

Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/wdvr due to reverted internally - @ezyang see D82281294 ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3317017530))
2025-09-22 05:39:07 +00:00
6c334885d4 [RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)
Summary:
Original: D81957844 and D81957923

Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
2025-09-12 10:54:42 +00:00
6b59a19242 Revert "[RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)"
This reverts commit 6e8f17c58029e5fa6bc222b2445ebbc0cbdc17c7.

Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3283985880))
2025-09-12 06:52:03 +00:00
6e8f17c580 [RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)
Summary:
Original: D81957844 and D81957923

Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
2025-09-12 03:56:18 +00:00
dda071587f Revert "Make distributed modules importable even when backend not built (#159889)" (#162568)
This reverts commit a0d026688cd69583d5a4e0c6f3e5fda141a7f4a9.

Revert "Always build USE_DISTRIBUTED. (#160449)"

This reverts commit d80297a6846f1f2c36fd4f19e22919f2abe8fcea.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568
Approved by: https://github.com/huydhn
2025-09-10 04:29:42 +00:00
bdbe931d58 [build] Add LeakSanitizer option to CMake (#158686)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158686
Approved by: https://github.com/eellison
2025-09-09 18:41:20 +00:00
d80297a684 Always build USE_DISTRIBUTED. (#160449)
Signed-off-by: Edward Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci
2025-09-08 19:10:36 +00:00
1e0656f063 Revert "Always build USE_DISTRIBUTED. (#160449)"
This reverts commit de893e96c775023aa3be895060848fac3296772c.

Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002))
2025-09-08 07:04:36 +00:00
de893e96c7 Always build USE_DISTRIBUTED. (#160449)
Signed-off-by: Edward Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci
2025-09-05 20:15:11 +00:00
adae7f66aa Revert "Always build USE_DISTRIBUTED. (#160449)"
This reverts commit c37103234afc832dcad307e9016230810957c9d5.

Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011))
2025-09-05 18:58:47 +00:00
c37103234a Always build USE_DISTRIBUTED. (#160449)
Signed-off-by: Edward Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci
2025-09-04 19:43:17 +00:00
b7dad7dd49 Revert "Always build USE_DISTRIBUTED. (#160449)"
This reverts commit 90b08643c3a6eb1f3265b7d1388bd76660759f46.

Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Already discussed with @ezyang about the internal quirks and errors ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3254219358))
2025-09-04 15:25:07 +00:00
9c957723a0 Replace setup.py develop with pip install -e (#156710)
#156027 already replaced most use of `python setup.py develop`. This PR only adds a few more occurrences.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156710
Approved by: https://github.com/atalman
2025-09-04 11:07:44 +00:00
90b08643c3 Always build USE_DISTRIBUTED. (#160449)
Signed-off-by: Edward Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci
2025-09-03 07:33:55 +00:00
4e42aa8ffc Revert "Always build USE_DISTRIBUTED. (#160449)"
This reverts commit b7034e9c924412bfbe8ee25a22d7e95239b5ca65.

Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, can't be landed with forward fix due to internal tooling problems ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3246689684))
2025-09-02 20:28:42 +00:00
b7034e9c92 Always build USE_DISTRIBUTED. (#160449)
Signed-off-by: Edward Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci
2025-09-01 23:00:21 +00:00
3e5b021f21 [ATen][CPU][Sparse] Use Third-Party Eigen for sparse add and addmm (#155357)
This pull request adds the following ops for sparse matrices using Eigen library:
```python
    add(a_csr, b_csr)
    add(a_csc, b_csc)

    addmm(c_csr, a_csr, b_csr)
    addmm(c_csr, a_csr, b_csc)
    addmm(c_csr, a_csc, b_csc)
    addmm(c_csr, a_csc, b_csr)

    addmm(c_csc, a_csr, b_csr)
    addmm(c_csc, a_csr, b_csc)
    addmm(c_csc, a_csc, b_csc)
    addmm(c_csc, a_csc, b_csr)
```

Currently, the operations for sparse matrices on CPU are available through MKL only. The non-existence of MKL on `aarch64` causes the unavailability of these ops on any machines with ARM based CPUs, including Apple Silicon, AWS Graviton and NVIDIA Grace. This PR addresses this issue by using Eigen as a backend for the above ops.

This is a re-factored version of my previous PR #101814. The main difference with the old one, this does not enable Eigen by default.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155357
Approved by: https://github.com/pearu, https://github.com/eqy

Co-authored-by: Eli Uriegas <eliuriegas@meta.com>
2025-08-23 19:03:55 +00:00
fc0683b1e7 Revert "[ATen][CPU][Sparse] Use Third-Party Eigen for sparse add and addmm (#155357)"
This reverts commit ce048de608180fa88335e5821070472539968b54.

Reverted https://github.com/pytorch/pytorch/pull/155357 on behalf of https://github.com/seemethere due to This is causing buck builds to fail since we didn't add the definition of AT_USE_EIGEN_SPARSE in the buckbuild.bzl file, will follow-up and re-land this. ([comment](https://github.com/pytorch/pytorch/pull/155357#issuecomment-3212270510))
2025-08-21 22:38:40 +00:00
ce048de608 [ATen][CPU][Sparse] Use Third-Party Eigen for sparse add and addmm (#155357)
This pull request adds the following ops for sparse matrices using Eigen library:
```python
    add(a_csr, b_csr)
    add(a_csc, b_csc)

    addmm(c_csr, a_csr, b_csr)
    addmm(c_csr, a_csr, b_csc)
    addmm(c_csr, a_csc, b_csc)
    addmm(c_csr, a_csc, b_csr)

    addmm(c_csc, a_csr, b_csr)
    addmm(c_csc, a_csr, b_csc)
    addmm(c_csc, a_csc, b_csc)
    addmm(c_csc, a_csc, b_csr)
```

Currently, the operations for sparse matrices on CPU are available through MKL only. The non-existence of MKL on `aarch64` causes the unavailability of these ops on any machines with ARM based CPUs, including Apple Silicon, AWS Graviton and NVIDIA Grace. This PR addresses this issue by using Eigen as a backend for the above ops.

This is a re-factored version of my previous PR #101814. The main difference with the old one, this does not enable Eigen by default.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155357
Approved by: https://github.com/pearu, https://github.com/eqy
2025-08-20 15:44:54 +00:00
cyy
c184cb3852 [submodule] Bump fbgemm to latest (#158210)
Merge the recent commits of FBGEMM and remove unnecessary CMake code.
Specifically, we
1. enable `fbgemm_autovec` since the target is now correctly handled.
2. remove option `USE_FAKELOWP` which is not used.
3. remove `CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS` check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158210
Approved by: https://github.com/q10
2025-08-11 13:48:02 +00:00
5f5f508aa8 [ROCm] Ck backend UX refactor (#152951)
Refactors how the enablement/disablement of CK Gemms and SDPA works.

- Adds USE_ROCM_CK_GEMM compile flag for enabling CK gemms.
- USE_ROCM_CK_GEMM is set to True by default on Linux
- Updates USE_CK_FLASH_ATTENTION to USE_ROCM_CK_SDPA.
- USE_ROCM_CK_SDPA is set to False by default
- (USE_CK_FLASH_ATTENTION still works for now, but will be deprecated in a future release)
- Prevents these CK libraries from being used unless pytorch has been built specifically with the functionality AND is running on a system architecture that supports it.
- the getters for these library backends will also do some validity checking in case the user used an environment variable to change the backend. If invalid, (i.e. one of the cases mentioned above is false) the backend will be set as the current non-CK default

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152951
Approved by: https://github.com/eqy, https://github.com/jeffdaily, https://github.com/m-gallus

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-08-08 18:40:17 +00:00
93da9952a7 gloo: fix building system gloo with CUDA/HIP (#146637)
Fix incorrect linking of Gloo's libraries when building with system Gloo. Previously, either Gloo's native library or Gloo's CUDA library were linked. However, Gloo had changed such that all users of Gloo must link the native library, and can optionally link the CUDA or HIP library for Gloo + CUDA/HIP support.
This had been updated when building/linking with vendored Gloo, but not when using system Gloo.

Fixes: #146239

Reported-by: Adam J Stewart <ajstewart426@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146637
Approved by: https://github.com/malfet
2025-08-06 22:56:31 +00:00
e9d27aa8fd [CUDA 13] CMake/Dependencies: no need to call find_package(CUB) (#159854)
CUB library is the part of CCCL of the CUDA Toolkit 13. If CUDA Found, CUB is found as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159854
Approved by: https://github.com/eqy
2025-08-06 06:03:58 +00:00
9b953bb3fb [BE] Update TensorPipe pin (#159834)
No functional changes, just:
- Update C++ standard to C++17
- Update `cmake` min version to 3.18
- Update `libuv` dependency to 1.51 (to move its cmake min version to 3.10)
- Replace boost optional implementation with `std::optional` wrapper
- Make it compilable with gcc-14.x plus by including `cstddef` in few headers
-  Avoid using deprecated enums for MacOS builds

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159834
Approved by: https://github.com/Skylion007
2025-08-05 20:45:09 +00:00
85d931f29e Use uppercase OR when checking for system XNNPACK (#159527)
This PR fixes `cmake/Dependencies.cmake` to work when compiling with `USE_SYSTEM_XNNPACK=ON` by changing a lowercase `or` to an uppercase `OR`.

---

For a personal project, I was building pytorch with a customized build of XNNPACK. When trying to do so I encountered the following error:

```
CMake Error at cmake/Dependencies.cmake:566 (if):
  if given arguments:

    "NOT" "XNNPACK_LIBRARY" "or" "NOT" "microkernels-prod_LIBRARY"

  Unknown arguments specified
Call Stack (most recent call first):
  CMakeLists.txt:868 (include)
```

Upon making the change in this PR (changing `or` to `OR`), the process continued as expected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159527
Approved by: https://github.com/janeyx99
2025-08-05 02:10:53 +00:00
e136a9175b [BE] Fix dev warning in Dependencies.cmake (#159702)
Namely
```
CMake Warning (dev) in cmake/Dependencies.cmake:
  A logical block opening on the line

    /Users/nshulga/git/pytorch/pytorch/cmake/Dependencies.cmake:261 (if)

  closes on the line

    /Users/nshulga/git/pytorch/pytorch/cmake/Dependencies.cmake:263 (endif)

  with mis-matching arguments.
```

Introduced by https://github.com/pytorch/pytorch/pull/143846

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159702
Approved by: https://github.com/cyyever, https://github.com/Skylion007
2025-08-03 18:45:07 +00:00
8f0998aafe Check F2C BLAS for OpenBLAS and other vendors (#143846)
This issue came from https://github.com/conda-forge/pytorch-cpu-feedstock/issues/180. MKL follows the F2C convention for returning single precision floats as doubles and uses the G77 convention for returning complex valued scalars. OpenBLAS does the opposite. There is a check for this already, but it's done only when the Generic BLAS vendor code path is used and this PR moves that code to `Dependencies.cmake` to make it work when the BLAS vendor is OpenBLAS and others

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143846
Approved by: https://github.com/rgommers, https://github.com/atalman
2025-07-01 05:56:24 +00:00
19f851ce10 Revert "Simplify nvtx3 CMake handling, always use nvtx3 (#153784)"
This reverts commit 099d0d6121125062ebc05771c8330cb7cd8d053a.

Reverted https://github.com/pytorch/pytorch/pull/153784 on behalf of https://github.com/Camyll due to breaking internal tests and cuda 12.4 builds still used in CI ([comment](https://github.com/pytorch/pytorch/pull/153784#issuecomment-3001702310))
2025-06-24 20:02:07 +00:00
b2d473c8f8 [ROCm][Windows] Fix rocsolver undefined symbol error (#156591)
Fix undefined symbol error while using `rocsolver_ssyevd_strided_batched` call in `aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156591
Approved by: https://github.com/jeffdaily
2025-06-24 03:28:45 +00:00
b1d62febd0 Revert "Use official CUDAToolkit module in CMake (#154595)"
This reverts commit 08dae945ae380d80efbaf140a95abfc5d96e5100.

Reverted https://github.com/pytorch/pytorch/pull/154595 on behalf of https://github.com/malfet due to It breaks on some local setup with no clear diagnostic, but looks like it fails to find cuFile ([comment](https://github.com/pytorch/pytorch/pull/154595#issuecomment-2997959344))
2025-06-23 21:15:31 +00:00
cyy
099d0d6121 Simplify nvtx3 CMake handling, always use nvtx3 (#153784)
Fall back to third-party NVTX3 if system NVTX3 doesn't exist. We also reuse the `CUDA::nvtx3` target for better interoperability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153784
Approved by: https://github.com/ezyang
2025-06-23 06:12:46 +00:00
cyy
08dae945ae Use official CUDAToolkit module in CMake (#154595)
Use CUDA language in CMake and remove forked FindCUDAToolkit.cmake.
Some CUDA targets are also renamed with `torch::` prefix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154595
Approved by: https://github.com/albanD
2025-06-22 05:44:29 +00:00
ee56e9f8a8 [BE] Make Eigen an optional dependency (#155955)
Whose version is controlled by `eigen_pin.txt`, but which will be installed only if BLAS providers could not be found.
Why this is good for CI: we don't really build with Eigen ever and gitlab can be down when github is up, which causes spurious CI failures in the past, for example.

Remove eigen submodule and replace it with eigen_pin.txt

Fixes https://github.com/pytorch/pytorch/issues/108773
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155955
Approved by: https://github.com/atalman
2025-06-21 03:02:02 +00:00
208ec60e72 Revert "[BE] Make Eigen an optional dependency (#155955)"
This reverts commit 1b50c12584909bda00009f4f0fd0d38ec792d019.

Reverted https://github.com/pytorch/pytorch/pull/155955 on behalf of https://github.com/atalman due to need to revert eigen test ([comment](https://github.com/pytorch/pytorch/pull/155955#issuecomment-2992512124))
2025-06-20 18:43:52 +00:00
1b50c12584 [BE] Make Eigen an optional dependency (#155955)
Whose version is controlled by `eigen_pin.txt`, but which will be installed only if BLAS providers could not be found.
Why this is good for CI: we don't really build with Eigen ever and gitlab can be down when github is up, which causes spurious CI failures in the past, for example.

Remove eigen submodule and replace it with eigen_pin.txt

Fixes https://github.com/pytorch/pytorch/issues/108773
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155955
Approved by: https://github.com/atalman
ghstack dependencies: #155947, #155954
2025-06-20 17:21:27 +00:00
30d3cf62fb support CUBLASLT_MATMUL_MATRIX_SCALE_OUTER_VEC_32F (#154680)
Requires CUDA >= 12.9 and sm_90.

hipBLASLt has a similar enum but is not available until ROCm 7.0. Support the new enum early using a cmake test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154680
Approved by: https://github.com/malfet, https://github.com/atalman
2025-06-18 18:39:01 +00:00
ccea6ddac3 [BE] fix typos in cmake/ (#156079)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156079
Approved by: https://github.com/Skylion007
2025-06-17 19:25:43 +00:00
1cce73b5f4 [build] Change --cmake{,-only} arguments to envvars to support modern Python build frontend (#156045)
See also:

- #156029
- #156027

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156045
Approved by: https://github.com/ezyang
ghstack dependencies: #156040, #156041
2025-06-17 11:40:24 +00:00
10cd1de518 [ROCm] Make optional features in LoadHIP better conditioned. (#155305)
* The `rocm-core` CMake package only started appearing in ROCm 6.4, so rework the version probing to work if it is not present. Also collapses the unneeded operating system conditioning in favor of feature probing.
* Make `hipsparselt` optional: it only started appearing in ROCm 6.4 and it is not in all recent distribution channels yet.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155305
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-06-07 02:20:55 +00:00
43390d8b13 ROCm Sparsity through HipSparseLT (#150578)
TLDR:

- This pull request introduces support for hipSPARSELt in ROCm, current usage would be semi-structure sparsity.
- Require **ROCm 6.4** && **gfx942/gfx950**.
- The average performance uplift (compare to dense operation) is ~ 20% in ROCm 6.4 but expect further performance lift along the way.

### Dense vs. Sparse Performance Comparison

#### **NT (Row-major)**
**Average Uplift**: `1.20`

| M     | N      | K      | hipsparselt-bench (us) | hipblaslt-bench get all (us) | Uplift |
|-------|--------|--------|-------------------------|-------------------------------|--------|
| 14336 | 8      | 4096   | 20.05                   | 25.3                          | 1.26   |
| 4096  | 8      | 14336  | 21.07                   | 25.28                         | 1.20   |
| 3072  | 3072   | 10240  | 299.05                  | 351.82                        | 1.18   |
| 3072  | 1536   | 768    | 18.56                   | 20.05                         | 1.08   |
| 3072  | 17664  | 768    | 163.13                  | 173.91                        | 1.07   |
| 3072  | 196608 | 768    | 1717.30                 | 1949.63                       | 1.14   |
| 3072  | 24576  | 768    | 206.84                  | 242.98                        | 1.17   |
| 3072  | 6144   | 768    | 53.90                   | 56.88                         | 1.06   |
| 3072  | 98304  | 768    | 833.77                  | 962.28                        | 1.15   |
| 768   | 1536   | 768    | 8.53                    | 19.65                         | 2.30   |
| 768   | 17664  | 768    | 46.02                   | 46.84                         | 1.02   |
| 768   | 196608 | 768    | 463.15                  | 540.46                        | 1.17   |
| 768   | 24576  | 768    | 54.32                   | 59.55                         | 1.10   |
| 768   | 6144   | 768    | 19.47                   | 20.15                         | 1.03   |
| 768   | 98304  | 768    | 231.88                  | 258.73                        | 1.12   |

---

#### **NN (Row-major)**
**Average Uplift**: `1.13`

| M   | N      | K     | hipsparselt-bench (us) | hipblaslt-bench get all (us) | Uplift |
|-----|--------|-------|-------------------------|-------------------------------|--------|
| 768 | 1536   | 3072  | 27.50                   | 28.78                         | 1.05   |
| 768 | 17664  | 3072  | 125.06                  | 158.94                        | 1.27   |
| 768 | 196608 | 3072  | 1568.38                 | 1767.12                       | 1.13   |
| 768 | 24576  | 3072  | 171.05                  | 203.49                        | 1.19   |
| 768 | 6144   | 3072  | 58.72                   | 60.39                         | 1.03   |
| 768 | 98304  | 3072  | 787.15                  | 887.60                        | 1.13   |

-------------------------

This pull request introduces support for hipSPARSELt in ROCm, alongside various updates and improvements to the codebase and test suite. The changes primarily involve adding configuration flags, updating conditional checks, and ensuring compatibility with hipSPARSELt.

### ROCm and hipSPARSELt Support:

* [`BUILD.bazel`](diffhunk://#diff-7fc57714ef13c3325ce2a1130202edced92fcccc0c6db34a72f7b57f60d552a3R292): Added `@AT_HIPSPARSELT_ENABLED@` substitution to enable hipSPARSELt support.
* [`aten/CMakeLists.txt`](diffhunk://#diff-0604597797bb21d7c39150f9429d6b2ace10b79ab308514ad03f76153ae8249bR104-R110): Introduced a conditional flag to enable hipSPARSELt support based on ROCm version.
* [`aten/src/ATen/CMakeLists.txt`](diffhunk://#diff-ce80f3115ab2f6be5142f0678a1fc92c6b2d7727766ce44f48726c99e720f777R37): Added `AT_HIPSPARSELT_ENABLED` configuration.
* [`aten/src/ATen/cuda/CUDAConfig.h.in`](diffhunk://#diff-8bb82da825ca87c28233abacffa1b0566c73a54990b7a77f3f5108d3718fea15R11): Defined `AT_HIPSPARSELT_ENABLED` macro.
* `caffe2/CMakeLists.txt`, `cmake/Dependencies.cmake`, `cmake/public/LoadHIP.cmake`: Included hipSPARSELt in the ROCm dependencies. [[1]](diffhunk://#diff-c5ee05f1e918772792ff6f2a3f579fc2f182e57b1709fd786ef6dc711fd68b27R1380) [[2]](diffhunk://#diff-12e8125164bbfc7556b1781a8ed516e333cc0bf058acb7197f7415be44606c72L1084-R1084) [[3]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5R153)

### Codebase Updates:

* [`aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp`](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R1-R6): Added hipSPARSELt support checks and initialization functions. Updated various methods to conditionally handle hipSPARSELt. [[1]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R1-R6) [[2]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R22-R67) [[3]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R78-R85) [[4]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R97-R109) [[5]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R183-R188) [[6]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3L134-R200) [[7]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R213-R222) [[8]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3L217-R285)

### Test Suite Updates:

* [`test/test_sparse_semi_structured.py`](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR50-R65): Added checks for hipSPARSELt availability and updated test conditions to skip tests not supported on ROCm. [[1]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR50-R65) [[2]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR228) [[3]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR239) [[4]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR250) [[5]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR579) [[6]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR624) [[7]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR661) [[8]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR695) [[9]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR730) [[10]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR755) [[11]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR771) [[12]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR809) [[13]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR844) [[14]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cL840-R854) [[15]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR1005)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150578
Approved by: https://github.com/jeffdaily
2025-05-31 02:03:40 +00:00
66f53889d5 [nativert] port semaphore to c10 util (#153504)
Summary:
nativert RFC: https://github.com/zhxchen17/rfcs/blob/master/RFC-0043-torch-native-runtime.md

To land the runtime into PyTorch core, we will gradually land logical parts of the code into the Github issue and get each piece properly reviewed.

This diff adds a simple semaphore interface into c10 until c++20 where we get counting_semaphore

gonna need a oss build export to take a look at this...

Test Plan: CI

Differential Revision: D73882656

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153504
Approved by: https://github.com/zhxchen17
2025-05-28 19:17:30 +00:00
0e5f2339d0 [ROCm][Windows] Run hipcc with compatibility flags. (#153986)
See also https://github.com/ROCm/TheRock/issues/590. Including the `-Wno-ignored-attributes` flag here avoids 700MB of log warning spam while compiling and the `-fms-extensions` seems beneficial to include: https://clang.llvm.org/docs/MSVCCompatibility.html.

Co-authored-by: Aaryaman Vasishta <jem456.vasishta@gmail.com>
Co-authored-by: Scott Todd <scott.todd0@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153986
Approved by: https://github.com/Skylion007, https://github.com/jeffdaily

Co-authored-by: Aaryaman Vasishta <jem456.vasishta@gmail.com>
2025-05-21 20:26:52 +00:00
daa68e7a93 Update USE_XCCL option if USE_XPU is OFF (#153936)
# Motivation
Disable `USE_XCCL` when `USE_XPU` is turned `OFF` to ensure configuration consistency. This is required because XCCL depends on XPU functionality.
Especially, ensure that `USE_XCCL` is correctly set to `OFF` when [caffe2_update_option(USE_XPU OFF)](1075bb37d3/cmake/Dependencies.cmake (L97)) is invoked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153936
Approved by: https://github.com/Skylion007
2025-05-21 01:32:41 +00:00