1713 Commits

Author SHA1 Message Date
50d418f69f Replace setup.py bdist_wheel with python -m build --wheel (#156712)
Previously we already replaced most use of `python setup.py develop/install`.

This PR also replaces the use of `setup.py bdist_wheel` with the modern `python -m build --wheel` alternative.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156712
Approved by: https://github.com/atalman
ghstack dependencies: #156711
2025-09-29 21:51:32 +00:00
fa54b08cd5 Replace setup.py install with pip install (#156711)
#156027 already replaced most use of `python setup.py install`.
This PR only adds a few more occurrences and adds `--no-build-isolation` in a few places.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156711
Approved by: https://github.com/atalman
2025-09-29 15:15:10 +00:00
54b38f3b46 Add operator benchmarking run to CI nightly (#162530)
This PR introduces a new "operator microbenchmark" CI workflow and GitHub Actions for operator microbenchmarks, updating test scripts and job matrices to support new parameters, and broadening the operator benchmark tests to include more data types, larger shapes, and gradient tests. The benchmark configurations now focus more on different cuda hardware and multiple dtypes (bf16, fp16, fp32), for both compile and eager mode.

**Benchmark Configuration and Coverage:**

* Expanded operator benchmark configurations in `addmm_test.py`, `bmm_test.py`, `matmul_test.py`, and `mm_test.py` to benchmark multiple dtypes on CUDA devices, in eager and compile mode, for forward and backward run. The configs with tag "long" for the above mentioned files are being run in CI.
* The CI benchmarking is running on various hardwares: H100, A100.
* The CI job also uploads the microbenchmarking outputs to a [HUD](https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch&benchmarkName=PyTorch+operator+microbenchmark) dashboard.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162530
Approved by: https://github.com/huydhn

Co-authored-by: Huy Do <huydhn@gmail.com>
2025-09-29 00:46:38 +00:00
5504a06e01 [BE]: Update NCCL to 2.28.3 (#162351)
@eqy New NCCL has some a bunch of bugfixes for features including reducing the number SMs needed by NVLINK collectives as well as some very useful new APIs for SymmetricMemory.  Also allows FP8 support for non-reductive operations on pre-sm90 devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162351
Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/atalman
2025-09-28 01:38:59 +00:00
960290d629 [Docs] Add standard-imghdr for PyTorch Doc (#163944)
As the title stated.

Python [Pep-0594](https://peps.python.org/pep-0594) have removed imghdr from python standard libaries, the older version of sphinx don`t add it as installation dependencies, so we need to add it to requirement as an temporary dependencies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163944
Approved by: https://github.com/albanD, https://github.com/svekars
2025-09-27 08:14:51 +00:00
96182faf96 [CI][Distributed][CUDA][Symm-Mem] Enable B200 Symm Mem Test (#162988)
Inspired by https://github.com/pytorch/pytorch/pull/162981 and motivated by https://github.com/pytorch/pytorch/pull/159323 taking a total of 20 hours to finish (and unlikely to make it in short time due to https://github.com/pytorch/pytorch/issues/162178 )

Creating this subtest to get *something distributed* on B200.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162988
Approved by: https://github.com/malfet
2025-09-27 05:12:05 +00:00
f9095fb285 [Windows] Update libuv version from 1.39 to 1.51 (#160318)
Fixes: [#148315](https://github.com/pytorch/pytorch/issues/148315)

The PR updates `libuv` version as `conda-forge` channel doesn't contain `libuv=1.39` for Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160318
Approved by: https://github.com/iremyux, https://github.com/malfet
2025-09-26 23:29:21 +00:00
f1260c9b9a [ROCm][CI/CD] upgrade nightly wheels to ROCm 7.0 (#163937)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163937
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-26 21:42:09 +00:00
b776e0c71e [ROCm][CI/CD] create ROCm 7.0 magma tarball (#163883)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163883
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-26 00:51:17 +00:00
b61bdc7cc4 Fix cpp build (#162774)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162774
Approved by: https://github.com/malfet, https://github.com/atalman
2025-09-25 18:21:45 +00:00
6539537a59 [ROCm][CD] create ROCm 7.0 images for binary builds (#163860)
Adds gfx950.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163860
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-25 17:26:40 +00:00
00059db034 Revert "[RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)"
This reverts commit 09cb34c1dce8fe1b880bbf3115d8ddad3401d871.

Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/malfet due to reverted internally and now can be safely reverted in OSS ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3334176367))
2025-09-25 13:47:46 +00:00
3b73841f43 update test_quantization tests to run weekly (#163077)
Fixes #162854

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163077
Approved by: https://github.com/huydhn
2025-09-24 11:31:11 +00:00
b66aa1ade1 [ARM] Add test_memory_profiler to aarch64 tests (#145260)
TestMemoryProfilerE2E.test_memory_timeline is failing on AArch64, this fixes it and enables it in the opt-in list of tests for AArch64.

Fixes #142371

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145260
Approved by: https://github.com/fadara01, https://github.com/sraikund16
2025-09-24 09:29:13 +00:00
bf0747c6c6 [Code Clean] Remove deadcodes about Python3.9 [1/N] (#163626)
As the title stated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163626
Approved by: https://github.com/Skylion007, https://github.com/albanD
2025-09-24 07:30:50 +00:00
ca35dc2fdd [EZ] Fix UP041 violations (#163648)
I.e. use `TimeoutError` instead of `socket.timeout`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163648
Approved by: https://github.com/cyyever, https://github.com/Skylion007
2025-09-23 17:58:18 +00:00
5f0c7cb4aa Add B200 smoke test (#159494)
Okay running test_max_autotune locally on B200is horrible read, for now to get something landed I am focusing on test_matmul_cuda.py and test_fp8

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159494
Approved by: https://github.com/nWEIdia, https://github.com/huydhn
ghstack dependencies: #163460, #163537, #163552
2025-09-23 15:45:05 +00:00
68e75be86a Update pytorch_sphinx_theme2 to latest hash (#163269)
The updated theme:
- Fixes articleBody in the json+ld that caused previous Google Search issues
- Other minor fixes
- 404.html fixes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163269
Approved by: https://github.com/albanD
2025-09-22 23:20:23 +00:00
e558f7a222 [vllm hash update] update the pinned vllm hash (#163463)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163463
Approved by: https://github.com/pytorchbot

Co-authored-by: Huy Do <huydhn@gmail.com>
2025-09-22 21:24:56 +00:00
09cb34c1dc [RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)
Summary:
Original: D81957844 and D81957923

Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
2025-09-22 21:12:18 +00:00
d0086708dd [triton] update 3.5 pin to bbb06c0334a6772b92d24bde54956e675c8c6604 (#163382)
Includes:
* https://github.com/triton-lang/triton/pull/8211 to work around a PTXAS bug that was causing 03-matrix-multiplication tutorial matmuls to underperform due to excessive WGMMA waits
* https://github.com/triton-lang/triton/pull/8157 to fix a convert_layout bug

Verified that this passes Triton CI in https://github.com/pytorch/pytorch/pull/159158 and improves gemm perf (see https://github.com/pytorch/pytorch/issues/159704)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163382
Approved by: https://github.com/Camyll, https://github.com/atalman
2025-09-22 20:20:59 +00:00
5e7be98800 [BE] Update Python min version to 3.10 (#162310)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162310
Approved by: https://github.com/atalman, https://github.com/Skylion007, https://github.com/ZainRizvi
2025-09-22 17:04:21 +00:00
10adeb9044 Revert "[BE] Update Python min version to 3.10 (#162310)"
This reverts commit 9f5a644f0768258bc81f8b38492754d297399f74.

Reverted https://github.com/pytorch/pytorch/pull/162310 on behalf of https://github.com/malfet due to Broke lint, but to the best of my knowledge it's no longer possible to run lint for all files on PRs ([comment](https://github.com/pytorch/pytorch/pull/162310#issuecomment-3319289031))
2025-09-22 14:13:59 +00:00
9f5a644f07 [BE] Update Python min version to 3.10 (#162310)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162310
Approved by: https://github.com/atalman, https://github.com/Skylion007, https://github.com/ZainRizvi
2025-09-22 13:37:02 +00:00
f0078941cf Revert "[RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)"
This reverts commit 6c334885d48725197b5d35e2c1543efc0f4198d0.

Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/wdvr due to reverted internally - @ezyang see D82281294 ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3317017530))
2025-09-22 05:39:07 +00:00
a31acf32bd Clean up obsoleted vLLM tests (#163383)
They have been removed in https://github.com/vllm-project/vllm/pull/25117 and https://github.com/vllm-project/vllm/pull/22772, thus failing in trunk at the moment after the latest pin commit update

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163383
Approved by: https://github.com/wdvr, https://github.com/seemethere, https://github.com/malfet
2025-09-20 02:48:36 +00:00
0098e5636d [CI] Move Windows build/tests to Python-3.10 (#162862)
What supposed to be a very simple change end up being quite involved, as current Windows CI framework is quite inflexible, i.e. it takes a lots of argument, but later on ignores them, namely:
 - `PYTHON_VERSION` used to be a no-op that is simply ignored by the scripts
 - With this change, `setup-win` action will create an environment called `py_tmp` with specific python version + intel-openmp (that is hard runtime requirement, but for some reason not packaged into the wheel nor marked as such)
 - Copied test type dependencies from be01a40157/aws/ami/windows/scripts/Installers/Install-Pip-Dependencies.ps1 (L16) into `win-test.sh`, but made some adjustments to be compatible with 3.10 runtime (scipy version update) and just make rerun-tests compatible with the rest of the deps

I think in the long run, one needs to update 4432e2cacd/aws/ami/windows/scripts/Installers/Install-Miniconda3.ps1 that currently pins Miniconda python to 3.9, but also figure out how CI can still create a new environment without having to download all the dependencies all the time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162862
Approved by: https://github.com/wdvr, https://github.com/huydhn
ghstack dependencies: #163339, #163341
2025-09-19 22:51:38 +00:00
a273475b01 [BE] Introduce CONDA_ROOT_DIR (#163341)
Which equal to `%CONDA_PARENT_DIR%/Miniconda3`, and replace this pattern with `%CONDA_ROOT_DIR%` throughout the codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163341
Approved by: https://github.com/clee2000
ghstack dependencies: #163339
2025-09-19 19:45:32 +00:00
33e6c5a93d [Dependabot] Update(deps): Bump transformers from 4.54.0 to 4.56.0 in /.ci/docker/ci_commit_pins (#162063)
* [Dependabot] Update(deps): Bump transformers

Bumps [transformers](https://github.com/huggingface/transformers) from 4.54.0 to 4.56.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.54.0...v4.56.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 4.56.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Refresh results

Signed-off-by: Huy Do <huydhn@gmail.com>

* Another round of updates

Signed-off-by: Huy Do <huydhn@gmail.com>

* Another round of update

Signed-off-by: Huy Do <huydhn@gmail.com>

* Hopefully the last round of update

Signed-off-by: Huy Do <huydhn@gmail.com>

* Plz

Signed-off-by: Huy Do <huydhn@gmail.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2025-09-19 02:50:36 -07:00
17081209e5 Revert "[CI] Move Windows build/tests to Python-3.10 (#162862)"
This reverts commit 2dcd153342d27b0981ff79eb2ccb8d8962e79c48.

Reverted https://github.com/pytorch/pytorch/pull/162862 on behalf of https://github.com/malfet due to Breaks some windows tests ([comment](https://github.com/pytorch/pytorch/pull/162862#issuecomment-3310606135))
2025-09-19 05:16:49 +00:00
2dcd153342 [CI] Move Windows build/tests to Python-3.10 (#162862)
What supposed to be a very simple change end up being quite involved, as current Windows CI framework is quite inflexible, i.e. it takes a lots of argument, but later on ignores them, namely:
 - `PYTHON_VERSION` used to be a no-op that is simply ignored by the scripts
 - With this change, `setup-win` action will create an environment called `py_tmp` with specific python version + intel-openmp (that is hard runtime requirement, but for some reason not packaged into the wheel nor marked as such)
 - Introduced `CONDA_ROOT_DIR` env variable in `activate_miniconda3.bat` to avoid `%CONDA_PARENT_DIR%\Miniconda3` invocations throughout the codebase
 - Copied test type dependencies from be01a40157/aws/ami/windows/scripts/Installers/Install-Pip-Dependencies.ps1 (L16) into `win-test.sh`, but made some adjustments to be compatible with 3.10 runtime (scipy version update) and just make rerun-tests compatible with the rest of the deps

I think in the long run, one needs to update 4432e2cacd/aws/ami/windows/scripts/Installers/Install-Miniconda3.ps1 that currently pins Miniconda python to 3.9, but also figure out how CI can still create a new environment without having to download all the dependencies all the time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162862
Approved by: https://github.com/wdvr, https://github.com/huydhn
2025-09-19 00:33:03 +00:00
f4eca0e3b3 Try updating ET pin in PT/PT (#159664)
Looking into resolving this: https://github.com/pytorch/pytorch/issues/159599

Test Plan: Wait for executorch CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159664
Approved by: https://github.com/malfet
2025-09-18 21:55:16 +00:00
1aeac304b8 Move prioritized text linker optimization code from setup.py to cmake (#160078)
Note. This is a replica PR of #155901 which will be closed. I had to create a new PR in order to add it into my ghstack as there are some later commits which depend on it.

### Summary

🚀 This PR moves the prioritized text linker optimization from setup.py to cmake ( and enables by default on Linux aarch64 systems )

This change consolidates what was previously manual CI logic into a single location (cmake), ensuring consistent behavior across local builds, CI pipelines, and developer environments.

### Motivation
Prioritized text layout has measurable performance benefits on Arm systems by reducing code padding and improving cache utilization. This optimization was previously triggered manually via CI scripts (.ci/aarch64_linux/aarch64_ci_build.sh) or user-set environment variables. By detecting the target architecture within setup.py, this change enables the optimization automatically where applicable, improving maintainability and usability.

Note:

Due to ninja/cmake graph generation issues we cannot apply the linker file globally to all targets to the targets must be manually defined. See CMakeLists.txt the main libraries torch_python, torch, torch_cpu, torch_cuda, torch_xpu have been targetted which should be enough to maintain the performance benefits outlined above.

Co-authored-by: Usamah Zaheer <usamah.zaheer@arm.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160078
Approved by: https://github.com/seemethere
2025-09-18 17:09:48 +00:00
8dbac62edb [CI] Update NVIDIA driver to 580.82.07 (#163111)
To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with 6e08c9d08e

This fix was suggested in https://github.com/pytorch/pytorch/issues/162878#issuecomment-3288635745

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163111
Approved by: https://github.com/huydhn
2025-09-17 17:37:06 +00:00
4ca3f435fb Revert "[CI] Update NVIDIA driver to 580.82.07 (#163111)"
This reverts commit 16475a829f7fe3b1dc3c74573740df09ffaec650.

Reverted https://github.com/pytorch/pytorch/pull/163111 on behalf of https://github.com/malfet due to It started to fail now, but worked just fine in PR CI ([comment](https://github.com/pytorch/pytorch/pull/163111#issuecomment-3303707671))
2025-09-17 16:20:31 +00:00
16475a829f [CI] Update NVIDIA driver to 580.82.07 (#163111)
To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with 6e08c9d08e

This fix was suggested in https://github.com/pytorch/pytorch/issues/162878#issuecomment-3288635745

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163111
Approved by: https://github.com/huydhn
2025-09-17 14:44:06 +00:00
89a6dbe73a Filter out local timer tests which are unimplemented in Python on AArch64 (#158342)
This stems from using a conda build of Python, which incorrectly detects this as unimplemented:
https://github.com/conda-forge/python-feedstock/issues/804

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158342
Approved by: https://github.com/malfet
2025-09-17 11:31:57 +00:00
c527292c43 [CI] Remove functorch doc build jobs (#163101)
As repo has been archived, there couldn't be any doc updates

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163101
Approved by: https://github.com/svekars, https://github.com/zou3519, https://github.com/ZainRizvi
2025-09-16 22:25:59 +00:00
c9e57d7e9f [CI] Move libtorch-cpu-shared-with-deps-release-build to python 3.10 (#162877)
Related to https://github.com/pytorch/pytorch/pull/162862

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162877
Approved by: https://github.com/malfet
2025-09-15 15:27:25 +00:00
cad052423b [triton] Update 3.5 pin to 5ae38bdb0dc066c5823e34dc9797afb9de42c866 (#162821)
Include @aakhundov's sam_fast patch, plus NVIDIA's sm88/sm110 patches (thanks @nWEIdia)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162821
Approved by: https://github.com/atalman
2025-09-12 18:34:22 +00:00
6c334885d4 [RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)
Summary:
Original: D81957844 and D81957923

Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
2025-09-12 10:54:42 +00:00
6b59a19242 Revert "[RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)"
This reverts commit 6e8f17c58029e5fa6bc222b2445ebbc0cbdc17c7.

Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3283985880))
2025-09-12 06:52:03 +00:00
6e8f17c580 [RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed modules importable even when backend not built (#159889) (#162594)
Summary:
Original: D81957844 and D81957923

Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
2025-09-12 03:56:18 +00:00
e8eeb06034 Move inductor jobs 3.9->3.10 (#162323)
Related to: https://github.com/pytorch/pytorch/issues/161167

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162323
Approved by: https://github.com/huydhn, https://github.com/Skylion007

Co-authored-by: Huy Do <huydhn@gmail.com>
2025-09-12 03:43:06 +00:00
3cd734584d bring back the old vllm's use_existing_torch.py (#162747)
vllm's pr will override our dependencies for torch.

quick fix to add the use_existing_torch.py. syncing with vllm now regarding the uv approach they have

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162747
Approved by: https://github.com/huydhn
2025-09-12 03:41:39 +00:00
84d8ec73f1 [CD] Build Mac wheels using setup-python action (#162136)
Biggest difference between both conda and homebrew CPython builds and one from python.org, is that later are universal binaries and they are always trying to build universal extension...

Workaround lots of universal binary build attempts by explicitly specifying both `_PYTHON_PLATFORM` and `--plat-name` as well as `ARCH_FLAGS`

Suppressed actionlint warning on use of `freethreaded` flag which is document in https://github.com/actions/setup-python/tree/v5

TODO: Remove lots of temporary workarounds when `3.14` is out in October 2025

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162136
Approved by: https://github.com/atalman, https://github.com/huydhn
ghstack dependencies: #162297, #162265
2025-09-12 00:16:31 +00:00
799471d92b [triton] Update 3.5 pin (AMD compilation fix + warp spec) (#162733)
Fixes #162390

Also adds warp spec (thanks @manman-ren!)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162733
Approved by: https://github.com/atalman
2025-09-11 18:19:16 +00:00
2f5a24c2a2 Smoke tests don't run nvshmem on Windows (#162646)
Only available for linux x86 and aarch64 :
https://pypi.org/project/nvidia-nvshmem-cu13/#files

nvshmem is available only on linux:
``
"nvidia-nvshmem-cu12==3.3.24; platform_system == 'Linux' and platform_machine == 'x86_64' | "
``
https://github.com/pytorch/pytorch/blob/main/.github/scripts/generate_binary_build_matrix.py#L57
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162646
Approved by: https://github.com/kwen2501
2025-09-11 16:09:20 +00:00
94db2ad51d Revert "Move prioritized text linker optimization code from setup.py to cmake (#160078)"
This reverts commit 26b3ae58908becbb03b28636f7384d2972a8c9a5.

Reverted https://github.com/pytorch/pytorch/pull/160078 on behalf of https://github.com/atalman due to Sorry reverting this broke linux aarch64 CUDA nightlies [pytorch/pytorch/actions/runs/17637486681/job/50146967503](https://github.com/pytorch/pytorch/actions/runs/17637486681/job/50146967503) ([comment](https://github.com/pytorch/pytorch/pull/160078#issuecomment-3281426631))
2025-09-11 15:29:29 +00:00
9f783e172d Revert "Build and Install Arm Compute Library in manylinux docker image (#159737)"
This reverts commit 582d278983b28a91ac0cedd035183f2495bb6887.

Reverted https://github.com/pytorch/pytorch/pull/159737 on behalf of https://github.com/atalman due to Sorry reverting this broke linux aarch64 CUDA nightlies [pytorch/pytorch/actions/runs/17637486681/job/50146967503](https://github.com/pytorch/pytorch/actions/runs/17637486681/job/50146967503) ([comment](https://github.com/pytorch/pytorch/pull/159737#issuecomment-3281398272))
2025-09-11 15:25:24 +00:00