pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Huy Do	4400c5d31e	Continue to build nightly CUDA 12.9 for internal (#163029 ) Revert part of https://github.com/pytorch/pytorch/pull/161916 to continue building CUDA 12.9 nightly Pull Request resolved: https://github.com/pytorch/pytorch/pull/163029 Approved by: https://github.com/malfet	2025-10-11 08:26:47 +00:00
Robert Hardwick	d9c80ef97d	Build and Install Arm Compute Library in manylinux docker image (#159737 ) ---- This PR will be part of a series of PR's that aims to remove `.ci/aarch64_linux` folder entirely, such that Aarch64 manylinux build happens as part of `.ci/manywheel/build.sh`, the same as other platforms. In this PR: - We prebuild + install Arm Compute Library in the manylinux docker image ( at /acl ), instead of a build time for every pytorch build. Also updated jammy install path to be /acl too. - We can therefore remove build_ArmComputeLibrary functions from the ci build scripts. - There is also some refactoring of install_openblas.sh and install_acl.sh to align them together ( similar formatting, similar variable names, same place for version number update ) - We had 2 places to define openblas version, this has been reduced to 1 now ( install_openblas.sh ). - ACL_VERSION and OPENBLAS_VERSION are now able to be overriden at build.sh level for developers, but there is only 1 version of each hardcoded for ci. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159737 Approved by: https://github.com/seemethere, https://github.com/aditew01	2025-10-01 11:33:51 +00:00
Wei Wang	3b4ad4a17d	[AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 (#163988 ) See also #163972, which was intended to be this PR. Triton (release/3.5.x) by default ships CUDA12.8 ptxas. This PR tries to bundle a ptxas version for cuda13, so that it can help https://github.com/pytorch/pytorch/issues/163801 when users run on new devices like THOR and Spark. Fixes https://github.com/pytorch/pytorch/issues/163801 Test Plan: Check binary size increase against nightly or v2.9RC Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression. Reference: https://github.com/pytorch/pytorch/pull/119750 and `5c814e2527` Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary. However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then `c6ad34f7eb/python/triton/knobs.py (L216)` would still complain ptxas not found (if removed - it won't know this new one available) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163988 Approved by: https://github.com/atalman	2025-09-30 01:56:12 +00:00
Klaus Zimmermann	50d418f69f	Replace setup.py bdist_wheel with python -m build --wheel (#156712 ) Previously we already replaced most use of `python setup.py develop/install`. This PR also replaces the use of `setup.py bdist_wheel` with the modern `python -m build --wheel` alternative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156712 Approved by: https://github.com/atalman ghstack dependencies: #156711	2025-09-29 21:51:32 +00:00
FFFrog	bf0747c6c6	[Code Clean] Remove deadcodes about Python3.9 [1/N] (#163626 ) As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163626 Approved by: https://github.com/Skylion007, https://github.com/albanD	2025-09-24 07:30:50 +00:00
Nikita Shulga	ca35dc2fdd	[EZ] Fix UP041 violations (#163648 ) I.e. use `TimeoutError` instead of `socket.timeout` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163648 Approved by: https://github.com/cyyever, https://github.com/Skylion007	2025-09-23 17:58:18 +00:00
Robert Hardwick	1aeac304b8	Move prioritized text linker optimization code from setup.py to cmake (#160078 ) Note. This is a replica PR of #155901 which will be closed. I had to create a new PR in order to add it into my ghstack as there are some later commits which depend on it. ### Summary 🚀 This PR moves the prioritized text linker optimization from setup.py to cmake ( and enables by default on Linux aarch64 systems ) This change consolidates what was previously manual CI logic into a single location (cmake), ensuring consistent behavior across local builds, CI pipelines, and developer environments. ### Motivation Prioritized text layout has measurable performance benefits on Arm systems by reducing code padding and improving cache utilization. This optimization was previously triggered manually via CI scripts (.ci/aarch64_linux/aarch64_ci_build.sh) or user-set environment variables. By detecting the target architecture within setup.py, this change enables the optimization automatically where applicable, improving maintainability and usability. Note: Due to ninja/cmake graph generation issues we cannot apply the linker file globally to all targets to the targets must be manually defined. See CMakeLists.txt the main libraries torch_python, torch, torch_cpu, torch_cuda, torch_xpu have been targetted which should be enough to maintain the performance benefits outlined above. Co-authored-by: Usamah Zaheer <usamah.zaheer@arm.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160078 Approved by: https://github.com/seemethere	2025-09-18 17:09:48 +00:00
PyTorch MergeBot	94db2ad51d	Revert "Move prioritized text linker optimization code from setup.py to cmake (#160078 )" This reverts commit 26b3ae58908becbb03b28636f7384d2972a8c9a5. Reverted https://github.com/pytorch/pytorch/pull/160078 on behalf of https://github.com/atalman due to Sorry reverting this broke linux aarch64 CUDA nightlies [pytorch/pytorch/actions/runs/17637486681/job/50146967503](https://github.com/pytorch/pytorch/actions/runs/17637486681/job/50146967503) ([comment](https://github.com/pytorch/pytorch/pull/160078#issuecomment-3281426631))	2025-09-11 15:29:29 +00:00
PyTorch MergeBot	9f783e172d	Revert "Build and Install Arm Compute Library in manylinux docker image (#159737 )" This reverts commit 582d278983b28a91ac0cedd035183f2495bb6887. Reverted https://github.com/pytorch/pytorch/pull/159737 on behalf of https://github.com/atalman due to Sorry reverting this broke linux aarch64 CUDA nightlies [pytorch/pytorch/actions/runs/17637486681/job/50146967503](https://github.com/pytorch/pytorch/actions/runs/17637486681/job/50146967503) ([comment](https://github.com/pytorch/pytorch/pull/159737#issuecomment-3281398272))	2025-09-11 15:25:24 +00:00
Ting Lu	bb1d53bc47	[CD] CUDA 13 specific followup changes (#162455 ) Follow up for CUDA 13 bring up https://github.com/pytorch/pytorch/issues/159779 sm50-70 should not be added to sbsa build arch list, as previous archs had no support for arm. remove platform_machine from PYTORCH_EXTRA_INSTALL_REQUIREMENTS Pull Request resolved: https://github.com/pytorch/pytorch/pull/162455 Approved by: https://github.com/atalman	2025-09-11 00:03:47 +00:00
Robert Hardwick	582d278983	Build and Install Arm Compute Library in manylinux docker image (#159737 ) ---- This PR will be part of a series of PR's that aims to remove `.ci/aarch64_linux` folder entirely, such that Aarch64 manylinux build happens as part of `.ci/manywheel/build.sh`, the same as other platforms. In this PR: - We prebuild + install Arm Compute Library in the manylinux docker image ( at /acl ), instead of a build time for every pytorch build. Also updated jammy install path to be /acl too. - We can therefore remove build_ArmComputeLibrary functions from the ci build scripts. - There is also some refactoring of install_openblas.sh and install_acl.sh to align them together ( similar formatting, similar variable names, same place for version number update ) - We had 2 places to define openblas version, this has been reduced to 1 now ( install_openblas.sh ). - ACL_VERSION and OPENBLAS_VERSION are now able to be overriden at build.sh level for developers, but there is only 1 version of each hardcoded for ci. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159737 Approved by: https://github.com/seemethere ghstack dependencies: #160078	2025-09-10 15:39:38 +00:00
atalman	3d32bb114b	[CD] Aarch64 Fix packaging ``libarm_compute.so`` and other libraries to the aarch64 CUDA wheels (#162566 ) Fixes aarch64 linux packaging, following error: https://github.com/pytorch/vision/actions/runs/17612462583/job/50037380487#step:15:62 ``` Traceback (most recent call last): File "/__w/vision/vision/pytorch/vision/setup.py", line 13, in <module> import torch File "/__w/_temp/conda_environment_17612462583/lib/python3.11/site-packages/torch/__init__.py", line 415, in <module> from torch._C import * # noqa: F403 ^^^^^^^^^^^^^^^^^^^^^^ ImportError: libarm_compute.so: cannot open shared object file: No such file or directory ``` Due to missing dependencies. Current Error: File torch-2.10.0.dev20250910+cu130-cp310-cp310-linux_aarch64.whl is extracted File is repackaged as torch-2.10.0.dev20250910+cu130-cp310-cp310-manylinux_2_28_aarch64.whl File torch-2.10.0.dev20250910+cu130-cp310-cp310-linux_aarch64.whl renamed as torch-2.10.0.dev20250910+cu130-cp310-cp310-manylinux_2_28_aarch64.whl Hence the repackaging does not take any effect. This PR does following File torch-2.10.0.dev20250910+cu130-cp310-cp310-linux_aarch64.whl is extracted File torch-2.10.0.dev20250910+cu130-cp310-cp310-linux_aarch64.whl deleted File is repackaged as torch-2.10.0.dev20250910+cu130-cp310-cp310-manylinux_2_28_aarch64.whl Looks like after migrating from zipping the wheel to wheel pack renaming the wheel is no longer necessary. Hence removing renaming and deleting old file. ``` 2025-09-10T10:10:05.9652454Z Using nvidia libs from pypi - skipping CUDA library bundling 2025-09-10T10:10:05.9656595Z Copying to /pytorch/dist/tmp/torch/lib/libgomp.so.1 2025-09-10T10:10:05.9873843Z Copying to /pytorch/dist/tmp/torch/lib/libgfortran.so.5 2025-09-10T10:10:06.0410041Z Copying to /pytorch/dist/tmp/torch/lib/libarm_compute.so 2025-09-10T10:10:06.2869242Z Copying to /pytorch/dist/tmp/torch/lib/libarm_compute_graph.so 2025-09-10T10:10:06.4385740Z Copying to /pytorch/dist/tmp/torch/lib/libnvpl_lapack_lp64_gomp.so.0 2025-09-10T10:10:06.5461372Z Copying to /pytorch/dist/tmp/torch/lib/libnvpl_blas_lp64_gomp.so.0 2025-09-10T10:10:06.5728970Z Copying to /pytorch/dist/tmp/torch/lib/libnvpl_lapack_core.so.0 2025-09-10T10:10:06.6231872Z Copying to /pytorch/dist/tmp/torch/lib/libnvpl_blas_core.so.0 2025-09-10T10:10:14.1503110Z Updated tag from Tag: cp310-cp310-linux_aarch64 2025-09-10T10:10:14.1503482Z to Tag: cp310-cp310-manylinux_2_28_aarch64 2025-09-10T10:10:14.1503682Z 2025-09-10T10:10:41.6498892Z Repacking wheel as /pytorch/dist/torch-2.10.0.dev20250910+cu130-cp310-cp310-manylinux_2_28_aarch64.whl...OK 2025-09-10T10:10:41.9394460Z Renaming torch-2.10.0.dev20250910+cu130-cp310-cp310-linux_aarch64.whl wheel to torch-2.10.0.dev20250910+cu130-cp310-cp310-manylinux_2_28_aarch64.whl ``` Test Plan, Executed on local file: ``` inflating: ubuntu/dist/tmp/torch-2.9.0.dev20250909+cu130.dist-info/WHEEL inflating: ubuntu/dist/tmp/torch-2.9.0.dev20250909+cu130.dist-info/entry_points.txt inflating: ubuntu/dist/tmp/torch-2.9.0.dev20250909+cu130.dist-info/top_level.txt inflating: ubuntu/dist/tmp/torch-2.9.0.dev20250909+cu130.dist-info/RECORD Bundling CUDA libraries with wheel Updated tag from Tag: cp310-cp310-manylinux_2_28_aarch64 to Tag: cp310-cp310-manylinux_2_28_aarch64 Repacking wheel as ubuntu/dist/torch-2.9.0.dev20250909+cu130-cp310-cp310-manylinux_2_28_aarch64.whl...OK Copying torch-2.9.0.dev20250909+cu130-cp310-cp310-manylinux_2_28_aarch64.whl to artifacts Build Complete. Created torch-2.9.0.dev20250909+cu130-cp310-cp310-manylinux_2_28_aarch64.whl.. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162566 Approved by: https://github.com/jeanschmidt, https://github.com/NicolasHug	2025-09-10 14:22:41 +00:00
Robert Hardwick	26b3ae5890	Move prioritized text linker optimization code from setup.py to cmake (#160078 ) Note. This is a replica PR of #155901 which will be closed. I had to create a new PR in order to add it into my ghstack as there are some later commits which depend on it. ### Summary 🚀 This PR moves the prioritized text linker optimization from setup.py to cmake ( and enables by default on Linux aarch64 systems ) This change consolidates what was previously manual CI logic into a single location (cmake), ensuring consistent behavior across local builds, CI pipelines, and developer environments. ### Motivation Prioritized text layout has measurable performance benefits on Arm systems by reducing code padding and improving cache utilization. This optimization was previously triggered manually via CI scripts (.ci/aarch64_linux/aarch64_ci_build.sh) or user-set environment variables. By detecting the target architecture within setup.py, this change enables the optimization automatically where applicable, improving maintainability and usability. Note: Due to ninja/cmake graph generation issues we cannot apply the linker file globally to all targets to the targets must be manually defined. See CMakeLists.txt the main libraries torch_python, torch, torch_cpu, torch_cuda, torch_xpu have been targetted which should be enough to maintain the performance benefits outlined above. Co-authored-by: Usamah Zaheer <usamah.zaheer@arm.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160078 Approved by: https://github.com/seemethere	2025-09-10 09:21:53 +00:00
Ting Lu	897c4e70a7	Move to small wheel approach for CUDA SBSA wheel (#160720 ) https://github.com/pytorch/pytorch/issues/160673 Use download.pytorch.org's dependencies like x86 build instead of bundling libs into the wheel Pull Request resolved: https://github.com/pytorch/pytorch/pull/160720 Approved by: https://github.com/atalman	2025-09-09 00:18:43 +00:00
atalman	c0fc86b511	Fix aarch64 wheel pack (#159481 ) PR that introduced the change: https://github.com/pytorch/builder/pull/1775 Use wheel pack instead of zip to repack the wheel. It should regenerate the RECORD file and update all the hashes correctly. TODO: Apply wheel pack instead of zip to Rest of builds Add validation test to make sure wheel contents matches RECORD file Pull Request resolved: https://github.com/pytorch/pytorch/pull/159481 Approved by: https://github.com/malfet	2025-09-08 23:36:50 +00:00
Ting Lu	9c991b63ff	[CD] [aarch64] Add CUDA 12.6 and 12.8 to build matrix, remove 12.9 build (#162364 ) https://github.com/pytorch/pytorch/issues/159779 Add the full CUDA support matrix to sbsa build (12.6, 12.8) Same arch support as x86 build Remove 12.9 sbsa build Pull Request resolved: https://github.com/pytorch/pytorch/pull/162364 Approved by: https://github.com/atalman	2025-09-08 20:00:25 +00:00
Ting Lu	9632f4ea9f	[CD] [aarch64] Add CUDA 13.0 sbsa nightly build (#161257 ) https://github.com/pytorch/pytorch/issues/159779 CUDA SBSA build for CUDA 13.0 1. Supported archs: sm_80 to sm_120. Including support for Thor (sm_110), SPARK (sm_121), GB300 (sm_103). "This release adds support of SM110 GPUs for arm64-sbsa on Linux." from 13.0 release notes https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html 2. Use -compress-mode=size for binary size reduction, 13.0 wheel is 2.18 GB, when compared with 12.9 3.28 GB, that is 1.1 GB of savings and ~33.5% smaller. 3. Refactored the libs_to_copy list with common libs, and version_specific_libs. TODO: add the other CUDA archs in the existing support matrix of x86 to SBSA build as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/161257 Approved by: https://github.com/nWEIdia, https://github.com/atalman	2025-08-27 14:38:07 +00:00
Nikita Shulga	7bd4cfaef4	[BE] Update nvshem dependency to 3.3.20 (#160458 ) Which is manylinux2_28 compatible, even on aarch64 platform archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works. Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel Should fix https://github.com/pytorch/pytorch/issues/160425 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160458 Approved by: https://github.com/Skylion007, https://github.com/kwen2501, https://github.com/nWEIdia, https://github.com/atalman, https://github.com/tinglvv	2025-08-16 02:00:57 +00:00
PyTorch MergeBot	c015e53d37	Revert "[BE] Update nvshem dependency to 3.3.20 (#160458 )" This reverts commit e0488d9f00865fb56c931580c80e099771c6285e. Reverted https://github.com/pytorch/pytorch/pull/160458 on behalf of https://github.com/wdvr due to need to rerun workflow generation (failing workflow-checks) ([comment](https://github.com/pytorch/pytorch/pull/160458#issuecomment-3193133706))	2025-08-16 01:47:42 +00:00
Nikita Shulga	e0488d9f00	[BE] Update nvshem dependency to 3.3.20 (#160458 ) Which is manylinux2_28 compatible, even on aarch64 platform archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works. Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel Should fix https://github.com/pytorch/pytorch/issues/160425 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160458 Approved by: https://github.com/Skylion007, https://github.com/kwen2501, https://github.com/nWEIdia, https://github.com/atalman, https://github.com/tinglvv	2025-08-16 00:50:13 +00:00
Nikita Shulga	3008d985a8	[CD] Do not build pytorch with nvshem on ARM (#160465 ) As nvshmem binary from 3.3.9 is not compatible with manylinux2_28, and 3.3.20 is not available for download yet Also, package nvshmem binary into full wheel Fixes https://github.com/pytorch/pytorch/issues/160425 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160465 Approved by: https://github.com/atalman, https://github.com/huydhn	2025-08-13 04:10:43 +00:00
Aaron Gokaslan	beb4d7816d	[BE]: ruff PLC0207 - use maxsplit kwarg (#160107 ) Automatically replaces split with rsplit when relevant and only performs the split up to the first ( or last value). This allows early return of the split function and improve efficiency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160107 Approved by: https://github.com/albanD	2025-08-08 03:14:59 +00:00
PyTorch MergeBot	feaa02f9ad	Revert "[build] pin `setuptools>=77` to enable PEP 639 (#158104 )" This reverts commit a78fb63dbdf98a1db219095293de1a11005e0390. Reverted https://github.com/pytorch/pytorch/pull/158104 on behalf of https://github.com/malfet due to It still breaks inductor-perf-nightly, see https://github.com/pytorch/pytorch/actions/runs/16425364208/job/46417088208, I'm going to dismiss all previous reviews ([comment](https://github.com/pytorch/pytorch/pull/158104#issuecomment-3099706457))	2025-07-21 22:46:53 +00:00
Xuehai Pan	a78fb63dbd	[build] pin `setuptools>=77` to enable PEP 639 (#158104 ) For reference here is the link PEP 639: [peps.python.org/pep-0639](https://peps.python.org/pep-0639/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158104 Approved by: https://github.com/rgommers, https://github.com/Skylion007, https://github.com/atalman	2025-07-21 17:46:40 +00:00
Ting Lu	297daa1d30	[aarch64] Add sm_80 to CUDA SBSA build (#157843 ) related to https://github.com/pytorch/pytorch/issues/152690 This adds sm_80 to CUDA SBSA builds (12.9), so that we will be able to support Ampere family (e.g: sm_86) and Ada family (e.g: sm_89) on CUDA SBSA builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157843 Approved by: https://github.com/Skylion007, https://github.com/atalman	2025-07-09 11:46:34 +00:00
cyy	30d2648a4a	Install nvperf_host together with cupti (#156668 ) Because cupti depends on nvperf_host, as discussed in https://github.com/pytorch/pytorch/pull/154595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156668 Approved by: https://github.com/Skylion007	2025-06-28 04:26:36 +00:00
Ting Lu	de45c5f673	[aarch64] Add back NCCL lib to cuda arm wheel (#156888 ) We discovered that when importing latest 12.9 arm nightly wheel, it is missing the NCCL lib. With the use of USE_SYSTEM_NCCL=1, we need to copy the libnccl.so lib into our big wheel environment, so that it can be dynamically linked at runtime. https://github.com/pytorch/pytorch/pull/152835 enabled USE_SYSTEM_NCCL=1, which would use the system NCCL by default, and it would no longer use the one built from libtorch_cuda.so. With this PR, we add back the libnccl.so to be used at runtime. In this way, we also provide the flexibility to use different versions of NCCL from what came with the original pytorch build. related - https://github.com/pytorch/pytorch/issues/144768 ``` Python 3.12.3 (main, Jun 18 2025, 17:59:45) [GCC 13.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 417, in <module> from torch._C import * # noqa: F403 ^^^^^^^^^^^^^^^^^^^^^^ ImportError: libnccl.so.2: cannot open shared object file: No such file or directory ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/156888 Approved by: https://github.com/atalman	2025-06-26 10:24:18 +00:00
PyTorch MergeBot	b1d62febd0	Revert "Use official CUDAToolkit module in CMake (#154595 )" This reverts commit 08dae945ae380d80efbaf140a95abfc5d96e5100. Reverted https://github.com/pytorch/pytorch/pull/154595 on behalf of https://github.com/malfet due to It breaks on some local setup with no clear diagnostic, but looks like it fails to find cuFile ([comment](https://github.com/pytorch/pytorch/pull/154595#issuecomment-2997959344))	2025-06-23 21:15:31 +00:00
cyy	08dae945ae	Use official CUDAToolkit module in CMake (#154595 ) Use CUDA language in CMake and remove forked FindCUDAToolkit.cmake. Some CUDA targets are also renamed with `torch::` prefix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154595 Approved by: https://github.com/albanD	2025-06-22 05:44:29 +00:00
Ting Lu	344731fb25	Add CUDA 12.9.1 sbsa nightly binaries (#155819 ) https://github.com/pytorch/pytorch/issues/155196 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155819 Approved by: https://github.com/atalman	2025-06-13 18:52:41 +00:00
atalman	22641f42b6	[Binary-builds]Use System NCCL by default in CI/CD. (#152835 ) Use System NCCl by default. The correct nccl version is already built into the Manylinux docker image. Will followup with PR on detecting if user has NCCL installed and enabling USE_SYSTEM_NCCL by default in this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152835 Approved by: https://github.com/malfet	2025-05-30 18:51:48 +00:00
Jonathan Deakin	2225231a14	Enable AArch64 CI scripts to be used for local dev (#143190 ) - Allow user to specify custom ComputeLibrary directory, which is then built rather than checking out a clean copy - Remove `setup.py clean` in build. The CI environment should be clean already, removing this enables incremental rebuilds - Use all cores for building ComputeLibrary Mostly a port of https://github.com/pytorch/builder/pull/2028 with the conda part removed, because aarch64_ci_setup.sh has changed and can now handle being called twice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143190 Approved by: https://github.com/aditew01, https://github.com/fadara01, https://github.com/malfet Co-authored-by: David Svantesson-Yeung <David.Svantesson-Yeung@arm.com>	2025-05-23 12:09:59 +00:00
atalman	4277907d02	[binary builds] Linux aarch64 CUDA builds. Make sure tag is set correctly (#154045 ) 1. This should set the Manylinux 2.28 tag correctly for CUDA Aarch builds. I believe we used to have something similar in the old script: https://github.com/pytorch/pytorch/blob/main/.ci/aarch64_linux/build_aarch64_wheel.py#L811 ``Tag: cp311-cp311-linux_aarch64 ``-> ``Tag: cp311-cp311-manylinux_2_28_aarch64`` 2. Remove section for CUDA 12.6, since we no longer building CUDA 12.6 aarch64 builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/154045 Approved by: https://github.com/Camyll, https://github.com/malfet	2025-05-22 18:36:13 +00:00
atalman	836955bdbd	[Manylinux 2.28] Correct Linux aarch64 cuda binaries wheel name (#150786 ) Related to: https://github.com/pytorch/pytorch/issues/149044#issuecomment-2784044555 For CPU binaries we run auditwheel however for cuda binaries auditwheel produces invalid results . Hence we need to rename the file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150786 Approved by: https://github.com/malfet	2025-04-08 02:58:28 +00:00
cyy	9367f8f6f1	Remove outdated instructions from CI scripts (#149795 ) Some instructions about Python 3.8 and CUDA 11.3 are removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149795 Approved by: https://github.com/malfet	2025-03-22 18:37:07 +00:00
cyy	29c4f2c07a	Remove Ubuntu 18.04 scripts (#149479 ) Ubuntu 18.04 end of life reached on May 31, 2023. These code isn't used now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149479 Approved by: https://github.com/malfet	2025-03-20 00:13:40 +00:00
Andrey Talman	4df66e0b7f	Pin auditwheel to 6.2.0 (#149471 ) Observing aarch64 failure in nightly: https://github.com/pytorch/pytorch/actions/runs/13917778961/job/38943911228 Similar to: https://github.com/pytorch/vision/pull/8982 ``` 2025-03-18T08:44:58.4128744Z Repairing Wheel with AuditWheel 2025-03-18T08:44:58.5440988Z INFO:auditwheel.main_repair:Repairing torch-2.8.0.dev20250318+cpu-cp39-cp39-linux_aarch64.whl 2025-03-18T08:45:20.3393288Z Traceback (most recent call last): 2025-03-18T08:45:20.3393732Z File "/opt/python/cp39-cp39/bin/auditwheel", line 8, in <module> 2025-03-18T08:45:20.3394115Z sys.exit(main()) 2025-03-18T08:45:20.3394559Z File "/opt/_internal/cpython-3.9.21/lib/python3.9/site-packages/auditwheel/main.py", line 53, in main 2025-03-18T08:45:20.3395064Z result: int \| None = args.func(args, p) 2025-03-18T08:45:20.3395626Z File "/opt/_internal/cpython-3.9.21/lib/python3.9/site-packages/auditwheel/main_repair.py", line 203, in execute 2025-03-18T08:45:20.3396163Z out_wheel = repair_wheel( 2025-03-18T08:45:20.3396657Z File "/opt/_internal/cpython-3.9.21/lib/python3.9/site-packages/auditwheel/repair.py", line 84, in repair_wheel 2025-03-18T08:45:20.3397184Z raise ValueError(msg) 2025-03-18T08:45:20.3397620Z ValueError: Cannot repair wheel, because required library "libarm_compute.so" could not be located 2025-03-18T08:45:20.3678843Z Traceback (most recent call last): 2025-03-18T08:45:20.3679267Z File "/pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py", line 236, in <module> 2025-03-18T08:45:20.3680988Z pytorch_wheel_name = complete_wheel("/pytorch/") 2025-03-18T08:45:20.3681449Z File "/pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py", line 141, in complete_wheel 2025-03-18T08:45:20.3681976Z check_call(["auditwheel", "repair", f"dist/{wheel_name}"], cwd=folder) 2025-03-18T08:45:20.3682860Z File "/opt/python/cp39-cp39/lib/python3.9/subprocess.py", line 373, in check_call 2025-03-18T08:45:20.3683308Z raise CalledProcessError(retcode, cmd) 2025-03-18T08:45:20.3684034Z subprocess.CalledProcessError: Command '['auditwheel', 'repair', 'dist/torch-2.8.0.dev20250318+cpu-cp39-cp39-linux_aarch64.whl']' returned non-zero exit status 1. 2025-03-18T08:45:20.3790063Z ##[error]Process completed with exit code 1. 2025-03-18T08:45:20.3862012Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2025-03-18T08:45:20.3862448Z with: ``` Please note aarch64 CUDA failures are related to: https://github.com/pytorch/pytorch/pull/149351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149471 Approved by: https://github.com/malfet	2025-03-19 15:55:05 +00:00
Ting Lu	5a5ac98918	[aarch64] add libcufile for cu126 and cu128 (#148465 ) seeing ` File "/usr/local/lib/python3.12/site-packages/torch/__init__.py", line 411, in <module> from torch._C import * # noqa: F403 ^^^^^^^^^^^^^^^^^^^^^^ ImportError: libcufile.so.0: cannot open shared object file: No such file or directory` with arm cu128 nightly. related to https://github.com/pytorch/pytorch/pull/148137 need to copy the dependency for arm build as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/148465 Approved by: https://github.com/atalman, https://github.com/abhilash1910	2025-03-06 21:39:43 +00:00
Xuehai Pan	754fb834db	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 ) Reference: https://docs.astral.sh/ruff/formatter/#f-string-formatting - Change the outer quotes to double quotes for nested f-strings ```diff - f'{", ".join(args)}' + f"{', '.join(args)}" ``` - Change the inner quotes to double quotes for triple f-strings ```diff string = """ - {', '.join(args)} + {", ".join(args)} """ ``` - Join implicitly concatenated strings ```diff - string = "short string " "short string " f"{var}" + string = f"short string short string {var}" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144569 Approved by: https://github.com/Skylion007 ghstack dependencies: #146509	2025-02-24 19:56:09 +00:00
Fadi Arafeh	5a3a50c791	Update Arm Compute Library (ACL) to v25.02 (#147454 ) Among many things, this version of ACL fixes the redundant declaration warning that we're blocked on in (#145942, #146620, #147337) and introduces better scheduling heuristics for GEMMs Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147454 Approved by: https://github.com/malfet	2025-02-19 18:51:08 +00:00
Ting Lu	9e45bc82e9	[aarch64] CUDA 12.8 aarch64 builds to nightly binaries (#146378 ) https://github.com/pytorch/pytorch/issues/145570 Adding Cuda 12.8 and keeping 12.6 for the sbsa build, supported CUDA_ARCH: 9.0, 10.0, 12.0 Refactor the binaries matrix for cuda sbsa build. Previously cuda-aarch64 was hardcoded to cuda 12.6. Now reads 12.6 and 12.8, new build naming example [manywheel-py3_9-cuda-aarch64-12_8-build](https://github.com/pytorch/pytorch/actions/runs/13132625006/job/36640885079?pr=146378#logs) TODO: once 12.8 is stable, remove 12.6 in sbsa Pull Request resolved: https://github.com/pytorch/pytorch/pull/146378 Approved by: https://github.com/atalman	2025-02-05 02:55:21 +00:00
Aaron Orenstein	b5655d9821	PEP585 update - .ci android aten (#145177 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145177 Approved by: https://github.com/Skylion007	2025-01-21 06:31:26 +00:00
Nikita Shulga	6053242890	[CD] Enable python3.13t builds for aarch64 (#144698 ) But make sure that right numpy version is picked (2.0.2 does not support 3.13) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144698 Approved by: https://github.com/atalman ghstack dependencies: #144696, #144697, #144716	2025-01-14 02:29:01 +00:00
Nikita Shulga	eaa8a97b39	[RelEng] Add `--ami` option to build_aarch64 (#144685 ) Which should be mutually-exclusive with OS For example, one can use the following to alloc one-off instance ``` ./build_aarch64_wheel.py --alloc-instance --instance-type g5.4xlarge --key-name nshulga-key --ami ami-0f51103893c02957c --ebs-size 200 ``` TODO: - Figure out EBS volume name depending on the AMI (for `ami-05576a079321f21f8`(al2023) it's `/dev/xvda`, but for `ami-0f51103893c02957c`(deep learning container) it's `/dev/sda1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144685 Approved by: https://github.com/atalman	2025-01-14 01:48:27 +00:00
Nikita Shulga	58302c4eaa	[BE] [CD] Remove pygit2 dep for aarch64_wheel build (#144716 ) As it's incompatible with 3.13t and only used to fetch the branch name, which could be done by running ``` git rev-parse --abbrev-ref HEAD ``` Also, remove yet another reference to long gone `master` branch. Test plan: Download `manywheel-py3_11-cpu-aarch64.zip` produced by this PR, install it inside docker container and check it's version ``` # pip install torch-2.7.0.dev20250113+cpu-cp311-cp311-manylinux_2_28_aarch64.whl ... Installing collected packages: mpmath, typing-extensions, sympy, networkx, MarkupSafe, fsspec, filelock, jinja2, torch Successfully installed MarkupSafe-3.0.2 filelock-3.16.1 fsspec-2024.12.0 jinja2-3.1.5 mpmath-1.3.0 networkx-3.4.2 sympy-1.13.1 torch-2.7.0.dev20250113+cpu typing-extensions-4.12.2 root@434f2540345e:/# python Python 3.11.9 (main, Aug 1 2024, 23:33:10) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> torch.__version__ '2.7.0.dev20250113+cpu' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144716 Approved by: https://github.com/atalman ghstack dependencies: #144696, #144697	2025-01-14 00:43:46 +00:00
Ting Lu	b7bef1ca84	[aarch64] fix TORCH_CUDA_ARCH_LIST for cuda arm build (#144436 ) Fixes #144037 Root cause is CUDA ARM build did not call `.ci/manywheel/build_cuda.sh`, but calls `.ci/aarch64_linux/aarch64_ci_build.sh `instead. Therefore, https://github.com/pytorch/pytorch/blob/main/.ci/manywheel/build_cuda.sh#L56 was not called for CUDA ARM build. Adding the equivalent of the code to `.ci/aarch64_linux/aarch64_ci_build.sh` as a WAR. In the future, we should target to integrate the files in .ci/aarch64_linux/aarch64_ci_build.sh back to .ci/manywheel/build_cuda.sh. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144436 Approved by: https://github.com/atalman	2025-01-11 09:00:46 +00:00
atalman	8d35333498	[CD] Aarch64 builds should not override `OVERRIDE_PACKAGE_VERSION` envvar (#144285 ) Currently our nightly aarch64 binaries have correct suffixes +cpu or +cu126. But release binaries are missing these suffixes. Hence to correct this, make sure are nightly and release binaries are consistent, I propose this change. I see that override is already set correctly in release workflow: https://github.com/pytorch/pytorch/actions/runs/12383179841/job/34565381200 For CPU: ``` OVERRIDE_PACKAGE_VERSION="2.6.0+cpu" ``` For CUDA: ``` OVERRIDE_PACKAGE_VERSION="2.6.0+cu126" ``` The removed code will set : OVERRIDE_PACKAGE_VERSION="2.6.0" for both cuda and cpu builds for release binaries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144285 Approved by: https://github.com/malfet, https://github.com/tinglvv	2025-01-07 12:50:54 +00:00
Xuehai Pan	b77406a9ec	[BE][CI] bump `ruff` to 0.8.4 (#143753 ) Changes: 1. Bump `ruff` from 0.7.4 to 0.8.4 2. Change `%`-formatted strings to f-string 3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753 Approved by: https://github.com/Skylion007	2024-12-24 12:24:10 +00:00
Ting Lu	f26b75b7ac	[aarch64] add CUDA 12.6 sbsa nightly binary (#142335 ) related to #138440 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142335 Approved by: https://github.com/atalman	2024-12-10 06:19:28 +00:00
Nikita Shulga	f3cbf67686	[CD] Build aarch64 wheels without conda (#140093 ) As manylinuxaarch64-builder already comes pre-built with all versions of python runtime Refactor logic for setting path to DESIRED_PYTHON from `manywheel/build_common` into `set_desired_python.sh` and call it from aarch64_ci_setup.sh In followup PRs move scons and ninja installation into base docker image Pull Request resolved: https://github.com/pytorch/pytorch/pull/140093 Approved by: https://github.com/atalman	2024-11-08 22:24:28 +00:00

1 2

52 Commits