33 Commits

Author SHA1 Message Date
132ae8e6dd Don't link with libnvToolsExt when building for 12.9 (#165465)
This is to bring back this logic from https://github.com/pytorch/pytorch/pull/161916/files#diff-bf46b4a09ca67e50622bf84fefc0d11b584ffcc24ee6cc5019cf0fc7565d81a8L170.  Building libtorch on 12.9 is failing otherwise https://github.com/pytorch/pytorch/actions/runs/18458531395/job/52610761895:

```
cp: cannot stat '/usr/local/cuda/lib64/libnvToolsExt.so.1': No such file or directory
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165465
Approved by: https://github.com/atalman, https://github.com/malfet
2025-10-15 01:45:37 +00:00
bffc7dd1f3 [CD] Add cuda 13.0 libtorch builds, remove CUDA 12.9 builds (#161916)
Related to https://github.com/pytorch/pytorch/issues/159779

Adding CUDA 13.0 libtorch builds, followup after https://github.com/pytorch/pytorch/pull/160956
Removing CUDA 12.9 builds, See https://github.com/pytorch/pytorch/issues/159980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161916
Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007

Co-authored-by: Ting Lu <tingl@nvidia.com>
2025-09-05 07:47:54 +00:00
1de4540449 Use -compress-mode=size for CUDA 13 build for binary size reduction (#161316)
https://github.com/pytorch/pytorch/issues/159779

CUDA 13 added the support for --compress-mode flag for nvcc across all drivers of CUDA 13.X toolkits, enabling the possibility to use --compress-mode=size for significant size reduction (~71% less for CUDA Math APIs for example). https://developer.nvidia.com/blog/whats-new-and-important-in-cuda-toolkit-13-0/

Why we have to add for CUDA 13 only, quote from @ptrblck : Any usage of --compress-mode=size/balance will drop the support of older CUDA drivers and will bump the min. driver requirement to CUDA 12.4. https://github.com/pytorch/pytorch/pull/157791#issuecomment-3058027353

Default for CUDA 13 will be --compress-mode=balance which gives smaller binaries than LZ4 speed mode used in previous CUDA versions.

Related - https://github.com/pytorch/pytorch/pull/157791

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161316
Approved by: https://github.com/nWEIdia, https://github.com/Skylion007
2025-08-24 03:28:29 +00:00
49ff884b1e Add CUDA 13.0 x86 builds (#160956)
https://github.com/pytorch/pytorch/issues/159779

CUDA 13.0.0
NVSHMEM 3.3.20
CUDNN 9.12.0.46

Adding x86 linux builds for CUDA 13.
Adding libtorch docker.
Package naming changed for CUDA 13 (removed postfix -cu13 for some packages).

Preparation checklist:
1. Update index https://download.pytorch.org/whl/nightly/cu130 with pypi packages
2. Update packaging name based on https://pypi.org/project/cuda-toolkit/ metadata

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160956
Approved by: https://github.com/atalman

Co-authored-by: atalman <atalman@fb.com>
2025-08-22 11:31:09 +00:00
0d28d12b11 Fix typo packing libnvshmem into libtorch (#160778)
Fix typo after https://github.com/pytorch/pytorch/pull/160465
Fixes: https://github.com/pytorch/pytorch/issues/160762

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160778
Approved by: https://github.com/Camyll, https://github.com/malfet, https://github.com/ZainRizvi, https://github.com/Skylion007
2025-08-15 23:43:02 +00:00
3008d985a8 [CD] Do not build pytorch with nvshem on ARM (#160465)
As nvshmem binary from 3.3.9 is not compatible with manylinux2_28, and 3.3.20 is not available for download yet
Also, package nvshmem binary into full wheel

Fixes https://github.com/pytorch/pytorch/issues/160425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160465
Approved by: https://github.com/atalman, https://github.com/huydhn
2025-08-13 04:10:43 +00:00
493bd625e2 Revert "[BE]: Reduce binary size 40% using aggressive fatbin compression. (#157791)"
This reverts commit 9bdf87e8918b9a3f78d7bcb8a770c19f7c82ac15.

Reverted https://github.com/pytorch/pytorch/pull/157791 on behalf of https://github.com/albanD due to Reverting to avoid regressing on the driver supported ([comment](https://github.com/pytorch/pytorch/pull/157791#issuecomment-3058091176))
2025-07-10 16:14:06 +00:00
9bdf87e891 [BE]: Reduce binary size 40% using aggressive fatbin compression. (#157791)
NVCC apparently has a [compression-mode flag](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#compress-mode-default-size-speed-balance-none-compress-mode) to tell it how you want to compress the fatbinary since 12.4. This mode defaults to speed (pick a low compression mode that loads the file quickly). Since we are running into PyPi size issues, this will allow us to upload smaller wheel files.

From: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#compress-mode-default-size-speed-balance-none-compress-mode
```
size
Uses a compression mode more focused on reduced binary size, at the cost of compression and decompression time.
```

Up to 37.2%  reduction in binary size with virtually no drawback (except potentially a little slower loading of the .so at PyTorch startup).

694 MB for CUDA 12.9 builds with 6.0;7.0;7.5;8.0;8.6;9.0;10.0;12.0+PTX
vs
1.08GB for CUDA 12.9 builds with 7.5;8.0;8.6;9.0;10.0;12.0+PTX

CUDA 12.9 ***694MB*** vs ***1.08GB***

CUDA 12.8 ***604MB*** vs ***845MB***

This ends up saving PyPi.org approximately 19.6 PiB of bandwidth per month for the CUDA 12.9 case.

This will also allow us to add back CUDA 12.8 12.0+PTX which will make the package forward compatible on newer GPUs. Undoing the need for PR https://github.com/pytorch/pytorch/pull/157516 and https://github.com/pytorch/pytorch/pull/157634

<img alt="Screenshot 2025-07-08 at 5 36 44 PM" width="1061" src="https://private-user-images.githubusercontent.com/7563158/463890713-a53ec774-b036-4c0b-a5d5-301756e3644f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTIwNzY3OTIsIm5iZiI6MTc1MjA3NjQ5MiwicGF0aCI6Ii83NTYzMTU4LzQ2Mzg5MDcxMy1hNTNlYzc3NC1iMDM2LTRjMGItYTVkNS0zMDE3NTZlMzY0NGYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDcwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTA3MDlUMTU1NDUyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Yzg1OGExN2VjYmI3ZDFhNjIwZDk0NTBjOWFlZDIzYzY3MmExYTFiOGZhZjc0NTI1ZTk2YzM3YzdhYzkyYzZlMiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.2-YmmfXrBFuXCrjDCQ_iTgbtbwv9xNFqM6Goc_liDKE">

More details can be found in Nvidia's technical blog for CUDA 12.4: https://developer.nvidia.com/blog/runtime-fatbin-creation-using-the-nvidia-cuda-toolkit-12-4-compiler/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157791
Approved by: https://github.com/malfet, https://github.com/atalman
2025-07-10 15:51:04 +00:00
179dcc10e4 Add sm_70 arch for linux cuda 12.8 and 12.9 builds (#157558)
Please see: https://github.com/pytorch/pytorch/issues/157517
We would like to keep Volta architectures by default for release 2.8

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157558
Approved by: https://github.com/Skylion007, https://github.com/Camyll, https://github.com/seemethere, https://github.com/malfet
2025-07-08 23:02:10 +00:00
8408522976 Remove +PTX from CUDA 12.8 builds (#157516)
Remove +PTX from CUDA 12.8 builds and small refactor in build_cuda.sh.
Removing +PTX reduces binary size required to be able to upload binaries to pypi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157516
Approved by: https://github.com/malfet, https://github.com/ptrblck, https://github.com/tinglvv
2025-07-03 13:19:19 +00:00
cyy
30d2648a4a Install nvperf_host together with cupti (#156668)
Because cupti depends on nvperf_host, as discussed in https://github.com/pytorch/pytorch/pull/154595

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156668
Approved by: https://github.com/Skylion007
2025-06-28 04:26:36 +00:00
19f851ce10 Revert "Simplify nvtx3 CMake handling, always use nvtx3 (#153784)"
This reverts commit 099d0d6121125062ebc05771c8330cb7cd8d053a.

Reverted https://github.com/pytorch/pytorch/pull/153784 on behalf of https://github.com/Camyll due to breaking internal tests and cuda 12.4 builds still used in CI ([comment](https://github.com/pytorch/pytorch/pull/153784#issuecomment-3001702310))
2025-06-24 20:02:07 +00:00
b1d62febd0 Revert "Use official CUDAToolkit module in CMake (#154595)"
This reverts commit 08dae945ae380d80efbaf140a95abfc5d96e5100.

Reverted https://github.com/pytorch/pytorch/pull/154595 on behalf of https://github.com/malfet due to It breaks on some local setup with no clear diagnostic, but looks like it fails to find cuFile ([comment](https://github.com/pytorch/pytorch/pull/154595#issuecomment-2997959344))
2025-06-23 21:15:31 +00:00
cyy
099d0d6121 Simplify nvtx3 CMake handling, always use nvtx3 (#153784)
Fall back to third-party NVTX3 if system NVTX3 doesn't exist. We also reuse the `CUDA::nvtx3` target for better interoperability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153784
Approved by: https://github.com/ezyang
2025-06-23 06:12:46 +00:00
cyy
08dae945ae Use official CUDAToolkit module in CMake (#154595)
Use CUDA language in CMake and remove forked FindCUDAToolkit.cmake.
Some CUDA targets are also renamed with `torch::` prefix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154595
Approved by: https://github.com/albanD
2025-06-22 05:44:29 +00:00
0504480f37 Add CUDA 12.9 libtorch nightly (#155895)
https://github.com/pytorch/pytorch/issues/155196

with libtorch docker added, we can add the build script

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155895
Approved by: https://github.com/atalman
2025-06-19 13:15:42 +00:00
4c3da611c2 Add CUDA 12.9.1 x86 nightly binaries (#154980)
Adding CUDA 12.9.1 to nightly binaries matrix for linux (x86) builds.
Add sbsa and libtorch build docker images, builds addition will be follow-up PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154980
Approved by: https://github.com/eqy, https://github.com/atalman
2025-06-11 13:43:17 +00:00
7a03b0d2ca [BE] Remove CUDA 11 artifacts. Fix Check Binary workflow (#155555)
Please see: https://github.com/pytorch/pytorch/issues/147383

1. Remove CUDA 11 build and test artifacts. One place CUDA 12.4
2. Fix Check Binary Workflow to use Stable Cuda version variable rather then hardcoded one

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155555
Approved by: https://github.com/malfet, https://github.com/Skylion007
2025-06-10 21:32:08 +00:00
3863bbb55b [BE]: Update cusparselt to 0.7.1 (#155232)
Needed to support sparse operations on Blackwell, and implements new features for the library. Also optimizes library sizes vs 0.7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155232
Approved by: https://github.com/nWEIdia, https://github.com/malfet
2025-06-09 18:01:23 +00:00
34c6371d24 Add NVSHMEM to PYTORCH_EXTRA_INSTALL_REQUIREMENTS (#154568)
NVSHMEM 3.2.5 (released Mar 2025) have both cu11 and cu12 builds.
See:
https://pypi.nvidia.com/nvidia-nvshmem-cu12/
https://pypi.nvidia.com/nvidia-nvshmem-cu11/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154568
Approved by: https://github.com/atalman
ghstack dependencies: #154538
2025-06-04 17:43:24 +00:00
22641f42b6 [Binary-builds]Use System NCCL by default in CI/CD. (#152835)
Use System NCCl by default. The correct nccl version is already built into the Manylinux docker image.

Will followup with PR on detecting if user has NCCL installed and enabling USE_SYSTEM_NCCL by default in this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152835
Approved by: https://github.com/malfet
2025-05-30 18:51:48 +00:00
6006352ed3 [BE] Refactor manywheel build scripts (#154372)
1. Remove `CentOS Linux` cases, since its deprecated
2. Remove logic for old CUDA versions
3. Remove logic for `CUDA_VERSION=12.4` since we deprecated CUDA 12.4 support
4. Simplify setting `USE_CUFILE=1` - only supported on CUDA 12.6 and 12.8 builds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154372
Approved by: https://github.com/malfet, https://github.com/huydhn
2025-05-26 23:17:23 +00:00
9dd46a9233 Deprecate sm70 for cuda 12.8 binary (#147607)
follow up for https://github.com/pytorch/pytorch/pull/146265/files, dropping sm_70 as well, since "Architecture support for Maxwell, Pascal, and Volta is considered feature-complete and will be frozen in an upcoming release."

https://github.com/pytorch/pytorch/issues/145570

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147607
Approved by: https://github.com/atalman
2025-03-05 18:54:17 +00:00
784902983e Remove +PTX from cuda 12.6 builds (#148000)
Similar to: https://github.com/pytorch/pytorch/pull/141142

Ahead of the release 2.7
I see following validation failure: https://github.com/pytorch/test-infra/actions/runs/13552433445/job/37879041739?pr=6339
```
RuntimeError: Binary size of torch-2.7.0.dev20250226+cu126-cp310-cp310-manylinux_2_28_x86_64.whl 1076.45 MB exceeds the threshold 750 MB
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148000
Approved by: https://github.com/clee2000, https://github.com/ngimel, https://github.com/tinglvv
2025-02-27 02:02:11 +00:00
fe100c3c5b Add libtorch nightly build for CUDA 12.8 (#146265)
Try removing sm50 and sm60 to shrink binary size, and resolve the ld --relink error

"Architecture support for Maxwell, Pascal, and Volta is considered feature-complete and will be frozen in an upcoming release." from 12.8 release note.

Also updating the runner for cuda 12.8 test to g4dn (T4, sm75) due to the drop in sm50/60 support.

https://github.com/pytorch/pytorch/issues/145570

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146265
Approved by: https://github.com/atalman
2025-02-21 03:04:06 +00:00
861bf892fb Set USE_CUFILE=1 by default and add pypi package to binary build matrix (#145748)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145748
Approved by: https://github.com/atalman
2025-02-11 15:49:01 +00:00
9232355bb0 Add CUDA 12.8 manywheel x86 Builds to Binaries Matrix (#145792)
https://github.com/pytorch/pytorch/issues/145570

Adding cuda 12.8.0 x86 builds first
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145792
Approved by: https://github.com/nWEIdia, https://github.com/malfet, https://github.com/atalman
2025-01-31 16:12:02 +00:00
b7bef1ca84 [aarch64] fix TORCH_CUDA_ARCH_LIST for cuda arm build (#144436)
Fixes #144037

Root cause is CUDA ARM build did not call `.ci/manywheel/build_cuda.sh`, but calls `.ci/aarch64_linux/aarch64_ci_build.sh `instead. Therefore, https://github.com/pytorch/pytorch/blob/main/.ci/manywheel/build_cuda.sh#L56 was not called for CUDA ARM build.

Adding the equivalent of the code to `.ci/aarch64_linux/aarch64_ci_build.sh` as a WAR.

In the future, we should target to integrate the files in  .ci/aarch64_linux/aarch64_ci_build.sh back to .ci/manywheel/build_cuda.sh.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144436
Approved by: https://github.com/atalman
2025-01-11 09:00:46 +00:00
6c32ef4c5b Remove builder repo from workflows and scripts (#143776)
Part of https://github.com/pytorch/builder/issues/2054
Builder is repo is no longer used. Hence remove any references to builder repo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143776
Approved by: https://github.com/huydhn
2024-12-24 14:11:51 +00:00
dbe6fce185 [CUDA][Nightly Binary] Remove PTX from cuda 12.4 Nightly (#141142)
Separate cuda 12.4 | 12.6 logic
Remove PTX from cuda 12.4
Remove deprecated cuda 11.[6/7]

Discussed in https://github.com/pytorch/pytorch/issues/137374#issuecomment-2489200733

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141142
Approved by: https://github.com/atalman
2024-11-22 02:34:59 +00:00
14bb49fe98 Add CUDA 12.6 Linux Builds to Binaries Matrix (#138899)
Related to #138440

Issue tracker: https://github.com/pytorch/pytorch/issues/138609

Version based on https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138899
Approved by: https://github.com/atalman

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2024-11-12 19:52:31 +00:00
53f164cae5 [CUDA][CI][cusparselt] Only CUDA 11.8 ships the libcusparseLt.so.0, CUDA 12 would use PYPI libcusparselt (#138547)
since nvidia-cusparselt-cu12 is available and
nvidia-cusparselt-cu11 is not available

Related: #138175
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138547
Approved by: https://github.com/atalman
2024-11-05 15:12:41 +00:00
912ea5601b Move manywheel binary scripts to pytorch (#138103)
PR to remove Manywheel Scripts:
https://github.com/pytorch/builder/pull/2017

Test PR : https://github.com/pytorch/pytorch/pull/138325

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138103
Approved by: https://github.com/malfet
2024-10-18 17:11:28 +00:00