4400c5d31e
Continue to build nightly CUDA 12.9 for internal ( #163029 )
...
Revert part of https://github.com/pytorch/pytorch/pull/161916 to continue building CUDA 12.9 nightly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163029
Approved by: https://github.com/malfet
2025-10-11 08:26:47 +00:00
0ec0120b19
Move aws OIDC credentials steps into setup-rocm.yml ( #164769 )
...
The AWS ECR login step needs `id-token: write` permissions. We move the steps to get OIDC-based credentials from `_rocm-test.yml` to `setup-rocm.yml`. This lays the groundwork to enable access to AWS ECR in workflows in other repos such as torchtitan that use [linux_job_v2.yml](https://github.com/pytorch/test-infra/blob/main/.github/workflows/linux_job_v2.yml ), which also uses [setup-rocm.yml](335f4f80a0/.github/workflows/linux_job_v2.yml (L168)
).
Any caller workflows that eventually execute `setup-rocm` action will thus need to provide the `id-token: write` permission.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164769
Approved by: https://github.com/huydhn
2025-10-10 21:24:29 +00:00
773c6762b8
[CD][CUDA13][NCCL] Fix nccl version typo for cu13 ( #164383 )
...
https://pypi.org/project/nvidia-nccl-cu13/#history does not have 2.27.5 but 2.27.7+.
Companion PR: https://github.com/pytorch/pytorch/pull/164352
Fixes a potential binary breakage due to non-existence of referenced NCCL cu13 version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164383
Approved by: https://github.com/tinglvv , https://github.com/Skylion007 , https://github.com/atalman
2025-10-01 21:32:25 +00:00
2610746375
Revert nccl upgrade back to 2.27.5 ( #164352 )
...
Revert https://github.com/pytorch/pytorch/pull/162351 as it breaks H100
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164352
Approved by: https://github.com/atalman , https://github.com/malfet
2025-10-01 15:27:40 +00:00
5504a06e01
[BE]: Update NCCL to 2.28.3 ( #162351 )
...
@eqy New NCCL has some a bunch of bugfixes for features including reducing the number SMs needed by NVLINK collectives as well as some very useful new APIs for SymmetricMemory. Also allows FP8 support for non-reductive operations on pre-sm90 devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162351
Approved by: https://github.com/ezyang , https://github.com/malfet , https://github.com/atalman
2025-09-28 01:38:59 +00:00
f1260c9b9a
[ROCm][CI/CD] upgrade nightly wheels to ROCm 7.0 ( #163937 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163937
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-09-26 21:42:09 +00:00
0ec946a052
[ROCm] Increase binary build timeout to 5 hours (300 minutes) ( #163776 )
...
Despite narrowing down the [FBGEMM_GENAI build to gfx942](https://github.com/pytorch/pytorch/pull/162648 ), the nightly builds still timed out because they [didn't get enough time to finish the post-PyTorch-build steps](https://github.com/pytorch/pytorch/actions/runs/17969771026/job/51109432897 ).
This PR increases timeout for ROCm builds for both [libtorch ](https://github.com/pytorch/pytorch/actions/runs/17969771026 )and [manywheel](https://github.com/pytorch/pytorch/actions/runs/17969771041 ), because both of those are close to the 4hr mark currently.
This PR is a more ROCm-targeted version of https://github.com/pytorch/pytorch/pull/162880 (which is for release/2.9 branch).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163776
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-09-24 23:02:08 +00:00
bb1d53bc47
[CD] CUDA 13 specific followup changes ( #162455 )
...
Follow up for CUDA 13 bring up https://github.com/pytorch/pytorch/issues/159779
sm50-70 should not be added to sbsa build arch list, as previous archs had no support for arm.
remove platform_machine from PYTORCH_EXTRA_INSTALL_REQUIREMENTS
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162455
Approved by: https://github.com/atalman
2025-09-11 00:03:47 +00:00
8922bbcaab
Use same NVSHMEM version across CUDA builds ( #162206 )
...
#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20.
This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162206
Approved by: https://github.com/tinglvv , https://github.com/Skylion007
2025-09-09 20:59:50 +00:00
5ccf3ca3ec
Revert "Use same NVSHMEM version across CUDA builds ( #162206 )"
...
This reverts commit 0d9c95cd7ee299e2e8c09df26d395be8775b506b.
Reverted https://github.com/pytorch/pytorch/pull/162206 on behalf of https://github.com/malfet due to Broke lint, see 4dd73e659a/1
([comment](https://github.com/pytorch/pytorch/pull/162206#issuecomment-3271040521 ))
2025-09-09 14:40:45 +00:00
0d9c95cd7e
Use same NVSHMEM version across CUDA builds ( #162206 )
...
#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20.
This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162206
Approved by: https://github.com/tinglvv , https://github.com/Skylion007
2025-09-09 08:52:27 +00:00
145a3a7bda
[CUDA 13][cuDNN] Bump CUDA 13 to cuDNN 9.13.0 ( #162268 )
...
Fixes some `d_qk` != `d_v` cases on Hopper that are broken by cuDNN 9.11-9.12
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162268
Approved by: https://github.com/drisspg , https://github.com/Skylion007
2025-09-06 01:59:03 +00:00
bffc7dd1f3
[CD] Add cuda 13.0 libtorch builds, remove CUDA 12.9 builds ( #161916 )
...
Related to https://github.com/pytorch/pytorch/issues/159779
Adding CUDA 13.0 libtorch builds, followup after https://github.com/pytorch/pytorch/pull/160956
Removing CUDA 12.9 builds, See https://github.com/pytorch/pytorch/issues/159980
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161916
Approved by: https://github.com/jeanschmidt , https://github.com/Skylion007
Co-authored-by: Ting Lu <tingl@nvidia.com >
2025-09-05 07:47:54 +00:00
6b8b3ac440
Revert "[ROCm] Use MI325 (gfx942) runners for binary smoke testing ( #162044 )"
...
This reverts commit cd529b686d54bbaa443f5b310140de48422d96c7.
Reverted https://github.com/pytorch/pytorch/pull/162044 on behalf of https://github.com/jeffdaily due to mi200 backlog is purged, and mi300 runners are failing in GHA download ([comment](https://github.com/pytorch/pytorch/pull/162044#issuecomment-3254427869 ))
2025-09-04 16:06:30 +00:00
cd529b686d
[ROCm] Use MI325 (gfx942) runners for binary smoke testing ( #162044 )
...
### Motivation
* MI250 Cirrascale runners are currently having network timeout leading to huge queueing of binary smoke test jobs:
<img width="483" height="133" alt="image" src="https://github.com/user-attachments/assets/17293002-78ad-4fc9-954f-ddd518bf0a43 " />
* MI210 Hollywood runners (with runner names such as `pytorch-rocm-hw-*`) are not suitable for these jobs, because they seem to take much longer to download artifacts: https://github.com/pytorch/pytorch/pull/153287#issuecomment-2918420345 (this is why these jobs were specifically targeting Cirrascale runners). However, it doesn't seem like Cirrascale runners are necessarily doing much better either e.g. [this recent build](https://github.com/pytorch/pytorch/actions/runs/17332256791/job/49231006755 ).
* Moving to MI325 runners should address the stability part at least, while also reducing load on limited MI2xx runner capacity.
* However, I'm not sure if the MI325 runners will do any better on the artifact download part (this may need to be investigated more) cc @amdfaa
* Also removing `ciflow/binaries` and `ciflow/binaries_wheel` label/tag triggers for `generated-linux-binary-manywheel-rocm-main.yml` because we already trigger ROCm binary build/test jobs via these labels/tags in `generated-linux-binary-manywheel-nightly.yml`. And for developers who want to trigger ROCm binary build/test jobs on their PRs, they can use the `ciflow/rocm-mi300` label/tag as per this PR.
### TODOs (cc @amdfaa):
* Check that the workflow runs successfully on the MI325 runners in this PR. Note how long the test jobs take esp. the "Download Build Artifacts" step
* Once this PR is merged, clear the queue of jobs targeting `linux.rocm.gpu.mi250`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162044
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-09-03 18:34:07 +00:00
793fc12aff
[CD] Fix setup-xpu action issue ( #161934 )
...
Fix XPU CD test failure, refer https://github.com/pytorch/pytorch/actions/runs/17370923627/job/49315624191
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161934
Approved by: https://github.com/atalman
2025-09-02 16:03:44 +00:00
06c7516994
[BE] Upgrade XPU support package to 2025.2 ( #158733 )
...
Including below changes,
- Add XPU support package 2025.2 build and test in CI for both Linux and Windows
- Keep XPU support package 2025.1 build in CI to ensure no break issue until PyTorch 2.9 release
- Upgrade XPU support package from 2025.1 to 2025.2 in CD for both Linux and Windows
- Rename Linux CI job name & image name to n & n-1
- Update XPU runtime pypi packages dependencies of CD wheels
- Remove deprecated support package version docker image build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158733
Approved by: https://github.com/EikanWang , https://github.com/atalman
2025-08-27 19:33:38 +00:00
ae8d319fd4
Update NVSHMEM to 3.3.24 and fix download link ( #161321 )
...
https://github.com/pytorch/pytorch/issues/159779
Update NVSHMEM 3.3.24 for [PyTorch CUDA13 Binary Cannot Be Built with SM_75 with NVSHMEM](https://github.com/pytorch/pytorch/issues/160980 )
Enabled back sm_75 for NVSHMEM
Fixed the NVSHMEM download link for the issue with 3.3.20 download in issue - [[CD] nvshem-3.3.9 wheels for aarch64 is not manylinux2_28 compliant](https://github.com/pytorch/pytorch/issues/160425 )
Todo: Should also enable back build ARM with NVSHMEM since it is compatible with manylinux2_28
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161321
Approved by: https://github.com/Skylion007 , https://github.com/atalman
2025-08-26 13:26:18 +00:00
1a566c4909
Remove Python 3.9 nightly builds ( #161427 )
...
Please see https://github.com/pytorch/pytorch/issues/161167
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161427
Approved by: https://github.com/huydhn
2025-08-25 22:05:40 +00:00
49ff884b1e
Add CUDA 13.0 x86 builds ( #160956 )
...
https://github.com/pytorch/pytorch/issues/159779
CUDA 13.0.0
NVSHMEM 3.3.20
CUDNN 9.12.0.46
Adding x86 linux builds for CUDA 13.
Adding libtorch docker.
Package naming changed for CUDA 13 (removed postfix -cu13 for some packages).
Preparation checklist:
1. Update index https://download.pytorch.org/whl/nightly/cu130 with pypi packages
2. Update packaging name based on https://pypi.org/project/cuda-toolkit/ metadata
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160956
Approved by: https://github.com/atalman
Co-authored-by: atalman <atalman@fb.com >
2025-08-22 11:31:09 +00:00
e1a64b75ff
[CD] Delete full builds ( #161075 )
...
As they are no longer needed for Colab, see https://github.com/googlecolab/colabtools/issues/5508#issuecomment-3200871941 and
[<img width="896" height="128" alt="image" src="https://github.com/user-attachments/assets/a287393c-bde7-4e10-99bf-2e0d66346efe " />
](https://colab.research.google.com/drive/1YJ5Y0xsApXSewM1cQwWQ_AS3A77vytgq )
Fixes https://github.com/pytorch/pytorch/issues/160972
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161075
Approved by: https://github.com/atalman
2025-08-20 19:40:15 +00:00
7bd4cfaef4
[BE] Update nvshem dependency to 3.3.20 ( #160458 )
...
Which is manylinux2_28 compatible, even on aarch64 platform
archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works.
Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel
Should fix https://github.com/pytorch/pytorch/issues/160425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160458
Approved by: https://github.com/Skylion007 , https://github.com/kwen2501 , https://github.com/nWEIdia , https://github.com/atalman , https://github.com/tinglvv
2025-08-16 02:00:57 +00:00
c015e53d37
Revert "[BE] Update nvshem dependency to 3.3.20 ( #160458 )"
...
This reverts commit e0488d9f00865fb56c931580c80e099771c6285e.
Reverted https://github.com/pytorch/pytorch/pull/160458 on behalf of https://github.com/wdvr due to need to rerun workflow generation (failing workflow-checks) ([comment](https://github.com/pytorch/pytorch/pull/160458#issuecomment-3193133706 ))
2025-08-16 01:47:42 +00:00
e0488d9f00
[BE] Update nvshem dependency to 3.3.20 ( #160458 )
...
Which is manylinux2_28 compatible, even on aarch64 platform
archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works.
Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel
Should fix https://github.com/pytorch/pytorch/issues/160425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160458
Approved by: https://github.com/Skylion007 , https://github.com/kwen2501 , https://github.com/nWEIdia , https://github.com/atalman , https://github.com/tinglvv
2025-08-16 00:50:13 +00:00
d0226719a9
[BE][EZ] Delete remains of split-build logic ( #159990 )
...
Hopefully last piece of https://github.com/pytorch/pytorch/issues/138750
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159990
Approved by: https://github.com/atalman
ghstack dependencies: #159986
2025-08-07 01:59:30 +00:00
26d045bb60
Linux py 3.14 wheel builds ( #157559 )
...
Related to https://github.com/pytorch/pytorch/issues/156856
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157559
Approved by: https://github.com/malfet , https://github.com/albanD
2025-08-04 20:55:19 +00:00
476874b37f
[BE]: Update NCCL to 2.27.5 ( #157108 )
...
Update NCCL to 2.27.5. Minor version, improves Blackwell, Symmem FP8 support, and fixes a bug with MNVVL.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157108
Approved by: https://github.com/atalman
2025-07-08 15:40:54 +00:00
7275f28045
Fix cuda 12.9 aarch64 GPU builds. Update CUDA_STABLE variable. ( #157630 )
...
This contains 2 fixes that required in main and will need to be cherry-picked to Release 2.8 branch:
1. The PR https://github.com/pytorch/pytorch/pull/155819 missed to include triton change.
2. CUDA STABLE variable needs to be set to 12.8. Updating CUDA stable updates full static build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157630
Approved by: https://github.com/Skylion007 , https://github.com/jeanschmidt
2025-07-04 18:08:31 +00:00
a6fab82b16
[BE]: Fix NVSHMEM builds, add missing 12.9 dependency and update to latest for 2.8RC ( #157453 )
...
Fixed our bad builds of nvshmem, (we were not building or testing before) and also updates to the latest version. Newest versions has critical support for things that would actually make it useful, like bfloat16 and float16 support.
This is a proper fix for: https://github.com/pytorch/pytorch/pull/157411
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157453
Approved by: https://github.com/kwen2501 , https://github.com/atalman
2025-07-03 22:55:18 +00:00
a317c63d1b
[BE]: Update NCCL to 2.27.3 ( #155233 )
...
Fixes: https://github.com/pytorch/pytorch/issues/155052 and https://github.com/pytorch/pytorch/issues/153517
This upgrade is needed to effectively use those symmetric memory kernels anyway. Also fixes some nasty NCCL bugs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155233
Approved by: https://github.com/nWEIdia , https://github.com/kwen2501 , https://github.com/atalman , https://github.com/eqy
2025-06-14 19:20:31 +00:00
794ef6c9b8
Enable manywheel build and smoke test on main branch for ROCm ( #153287 )
...
Fixes issue of not discovering breakage of ROCm wheel builds until the nightly job runs e.g. https://github.com/pytorch/pytorch/pull/153253
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153287
Approved by: https://github.com/jeffdaily
2025-06-14 19:14:31 +00:00
4574b39aa4
Revert "[BE]: Sync cusparselt 12.9 with static build and other cuda 12 ( #155709 )"
...
This reverts commit bbbced94a43cf764ddfe719e7d4c161a3992830c.
Reverted https://github.com/pytorch/pytorch/pull/155709 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/15645591737/job/44082402642 ) [HUD commit link](bbbced94a4
) landrace with 155819? easy forward fix but its the end of the week so idk when id get a review ([comment](https://github.com/pytorch/pytorch/pull/155709#issuecomment-2972094849 ))
2025-06-14 01:43:16 +00:00
d7e3c9ce82
Revert "Enable manywheel build and smoke test on main branch for ROCm ( #153287 )"
...
This reverts commit 3b6569b1ef4b9ff25f5b75fe0a216d6d084d573f.
Reverted https://github.com/pytorch/pytorch/pull/153287 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/15646152483/job/44083912145 ) [HUD commit link](3b6569b1ef
) ([comment](https://github.com/pytorch/pytorch/pull/153287#issuecomment-2972088294 ))
2025-06-14 01:32:27 +00:00
3b6569b1ef
Enable manywheel build and smoke test on main branch for ROCm ( #153287 )
...
Fixes issue of not discovering breakage of ROCm wheel builds until the nightly job runs e.g. https://github.com/pytorch/pytorch/pull/153253
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153287
Approved by: https://github.com/jeffdaily
2025-06-14 00:05:57 +00:00
bbbced94a4
[BE]: Sync cusparselt 12.9 with static build and other cuda 12 ( #155709 )
...
followup for https://github.com/pytorch/pytorch/pull/154980
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155709
Approved by: https://github.com/tinglvv , https://github.com/atalman , https://github.com/nWEIdia , https://github.com/cyyever
2025-06-13 23:10:01 +00:00
9cced33c7c
[BE]: Update cudnn to 9.10.2.21 ( #155576 )
...
Update to CUDNN 9.10.2.21
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155576
Approved by: https://github.com/eqy , https://github.com/atalman
2025-06-12 12:50:36 +00:00
f59c76b549
Revert "[BE]: Update cudnn to 9.10.2.21 ( #155576 )"
...
This reverts commit 2d3615f577894c7a117a55e85bb8371bb598ec50.
Reverted https://github.com/pytorch/pytorch/pull/155576 on behalf of https://github.com/malfet due to breaks the same test again (I remember there were a version that adjusted tolerances), see bc3972b80a/1
([comment](https://github.com/pytorch/pytorch/pull/155576#issuecomment-2964404710 ))
2025-06-11 22:03:45 +00:00
2d3615f577
[BE]: Update cudnn to 9.10.2.21 ( #155576 )
...
Update to CUDNN 9.10.2.21
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155576
Approved by: https://github.com/eqy , https://github.com/atalman
2025-06-11 20:32:07 +00:00
4c3da611c2
Add CUDA 12.9.1 x86 nightly binaries ( #154980 )
...
Adding CUDA 12.9.1 to nightly binaries matrix for linux (x86) builds.
Add sbsa and libtorch build docker images, builds addition will be follow-up PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154980
Approved by: https://github.com/eqy , https://github.com/atalman
2025-06-11 13:43:17 +00:00
eaceb243df
[BE] Update the XPU support package to 2025.1.3 ( #154346 )
...
Fixes #153632
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154346
Approved by: https://github.com/EikanWang , https://github.com/atalman
2025-06-11 09:46:18 +00:00
8153340d10
[CI/CD] Remove CUDA 11.8 builds ( #155509 )
...
This removes CUDA 11.8 from CI/CD
Please see: https://github.com/pytorch/pytorch/issues/147383
TODO: Will followup of cleaning CUDA 11.8 config from scripts
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155509
Approved by: https://github.com/cyyever , https://github.com/huydhn , https://github.com/malfet
2025-06-10 05:16:41 +00:00
3863bbb55b
[BE]: Update cusparselt to 0.7.1 ( #155232 )
...
Needed to support sparse operations on Blackwell, and implements new features for the library. Also optimizes library sizes vs 0.7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155232
Approved by: https://github.com/nWEIdia , https://github.com/malfet
2025-06-09 18:01:23 +00:00
9656251bb1
Revert "[BE] Update cudnn to 9.10.1.4 ( #155122 )"
...
This reverts commit a14f427db68e54500ef4cd9ed34cb9537263bb74.
Reverted https://github.com/pytorch/pytorch/pull/155122 on behalf of https://github.com/malfet due to Looks like it breaks a bunch of tests, see 36a722e20d/1
([comment](https://github.com/pytorch/pytorch/pull/155122#issuecomment-2949209801 ))
2025-06-06 13:03:49 +00:00
a14f427db6
[BE] Update cudnn to 9.10.1.4 ( #155122 )
...
Follow up to #152782
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155122
Approved by: https://github.com/malfet , https://github.com/atalman
2025-06-05 16:07:25 +00:00
34c6371d24
Add NVSHMEM to PYTORCH_EXTRA_INSTALL_REQUIREMENTS ( #154568 )
...
NVSHMEM 3.2.5 (released Mar 2025) have both cu11 and cu12 builds.
See:
https://pypi.nvidia.com/nvidia-nvshmem-cu12/
https://pypi.nvidia.com/nvidia-nvshmem-cu11/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154568
Approved by: https://github.com/atalman
ghstack dependencies: #154538
2025-06-04 17:43:24 +00:00
bab59d3c28
Upgrade to CUDA 12.8.1 for nightly binaries ( #152923 )
...
Upgrade current CUDA 12.8 builds to 12.8.1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152923
Approved by: https://github.com/atalman
2025-05-23 22:37:05 +00:00
c92ea3bc98
[BE] Upgrade XPU support package to 2025.1 in CICD ( #151899 )
...
Address #151097 . Including below changes,
- Add XPU support package 2025.1 build and test in CI for both Linux and Windows
- Keep XPU support package 2025.0 build in CI to ensure no break issue until PyTorch 2.8 release
- Upgrade XPU support package from 2025.0 to 2025.1 in CD for both Linux and Windows
- Enable XCCL in Linux CD wheel and oneMKL integration in both both Linux and Windows
- Update XPU runtime pypi packages of CD wheels
- Remove deprecated support package version docker image build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151899
Approved by: https://github.com/EikanWang , https://github.com/atalman
2025-05-14 20:21:09 +00:00
7f79222992
Upgrade to NCCL 2.26.5 for CUDA 12 ( #152810 )
...
Upgrade NCCL to latest 2.26.5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152810
Approved by: https://github.com/eqy , https://github.com/albanD , https://github.com/nWEIdia , https://github.com/atalman , https://github.com/cyyever
2025-05-14 00:52:50 +00:00
e05ac9b794
Use folder tagged docker images for binary builds ( #151706 )
...
Should be the last part of https://github.com/pytorch/pytorch/pull/150558 , except for maybe s390x stuff, which I'm still not sure what's going on there
For binary builds, do the thing like we do in CI where we tag each image with a hash of the .ci/docker folder to ensure a docker image built from that commit gets used. Previously it would use imagename:arch-main, which could be a version of the image based on an older commit
After this, changing a docker image and then tagging with ciflow/binaries on the same PR should use the new docker images
Release and main builds should still pull from docker io
Cons:
* if someone rebuilds the image from main or a PR where the hash is the same (ex folder is unchanged, but retrigger docker build for some reason), the release would use that image instead of one built on the release branch
* spin wait for docker build to finish
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151706
Approved by: https://github.com/atalman
2025-04-22 21:50:10 +00:00
b4550541ea
[ROCm] upgrade nightly wheels to rocm6.4 ( #151355 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151355
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-04-17 17:29:07 +00:00