9095a9dfae
[CD] Apply the fix from #162455 to aarch64+cu129 build ( #165794 )
...
When trying to bring cu129 back in https://github.com/pytorch/pytorch/pull/163029 , I mainly looked at https://github.com/pytorch/pytorch/pull/163029 and missed another tweak coming from https://github.com/pytorch/pytorch/pull/162455
I discover this issue when testing aarch64+cu129 builds in https://github.com/pytorch/test-infra/actions/runs/18603342105/job/53046883322?pr=7373 . Surprisingly, there is no test running for aarch64 CUDA build from what I see in 79a37055e7
.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165794
Approved by: https://github.com/malfet
2025-10-18 04:16:24 +00:00
4400c5d31e
Continue to build nightly CUDA 12.9 for internal ( #163029 )
...
Revert part of https://github.com/pytorch/pytorch/pull/161916 to continue building CUDA 12.9 nightly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163029
Approved by: https://github.com/malfet
2025-10-11 08:26:47 +00:00
773c6762b8
[CD][CUDA13][NCCL] Fix nccl version typo for cu13 ( #164383 )
...
https://pypi.org/project/nvidia-nccl-cu13/#history does not have 2.27.5 but 2.27.7+.
Companion PR: https://github.com/pytorch/pytorch/pull/164352
Fixes a potential binary breakage due to non-existence of referenced NCCL cu13 version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164383
Approved by: https://github.com/tinglvv , https://github.com/Skylion007 , https://github.com/atalman
2025-10-01 21:32:25 +00:00
2610746375
Revert nccl upgrade back to 2.27.5 ( #164352 )
...
Revert https://github.com/pytorch/pytorch/pull/162351 as it breaks H100
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164352
Approved by: https://github.com/atalman , https://github.com/malfet
2025-10-01 15:27:40 +00:00
5504a06e01
[BE]: Update NCCL to 2.28.3 ( #162351 )
...
@eqy New NCCL has some a bunch of bugfixes for features including reducing the number SMs needed by NVLINK collectives as well as some very useful new APIs for SymmetricMemory. Also allows FP8 support for non-reductive operations on pre-sm90 devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162351
Approved by: https://github.com/ezyang , https://github.com/malfet , https://github.com/atalman
2025-09-28 01:38:59 +00:00
42e9902a0f
cd: Move arm64 to linux.arm64.r7g.12xlarge.memory ( #163681 )
...
This should reduce the amount of build time we have by a lot by just
throwing more hardware at the problem.
Signed-off-by: Eli Uriegas <eliuriegas@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163681
Approved by: https://github.com/huydhn , https://github.com/atalman , https://github.com/malfet
2025-09-24 04:06:09 +00:00
bb1d53bc47
[CD] CUDA 13 specific followup changes ( #162455 )
...
Follow up for CUDA 13 bring up https://github.com/pytorch/pytorch/issues/159779
sm50-70 should not be added to sbsa build arch list, as previous archs had no support for arm.
remove platform_machine from PYTORCH_EXTRA_INSTALL_REQUIREMENTS
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162455
Approved by: https://github.com/atalman
2025-09-11 00:03:47 +00:00
8922bbcaab
Use same NVSHMEM version across CUDA builds ( #162206 )
...
#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20.
This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162206
Approved by: https://github.com/tinglvv , https://github.com/Skylion007
2025-09-09 20:59:50 +00:00
9c991b63ff
[CD] [aarch64] Add CUDA 12.6 and 12.8 to build matrix, remove 12.9 build ( #162364 )
...
https://github.com/pytorch/pytorch/issues/159779
Add the full CUDA support matrix to sbsa build (12.6, 12.8)
Same arch support as x86 build
Remove 12.9 sbsa build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162364
Approved by: https://github.com/atalman
2025-09-08 20:00:25 +00:00
145a3a7bda
[CUDA 13][cuDNN] Bump CUDA 13 to cuDNN 9.13.0 ( #162268 )
...
Fixes some `d_qk` != `d_v` cases on Hopper that are broken by cuDNN 9.11-9.12
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162268
Approved by: https://github.com/drisspg , https://github.com/Skylion007
2025-09-06 01:59:03 +00:00
bffc7dd1f3
[CD] Add cuda 13.0 libtorch builds, remove CUDA 12.9 builds ( #161916 )
...
Related to https://github.com/pytorch/pytorch/issues/159779
Adding CUDA 13.0 libtorch builds, followup after https://github.com/pytorch/pytorch/pull/160956
Removing CUDA 12.9 builds, See https://github.com/pytorch/pytorch/issues/159980
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161916
Approved by: https://github.com/jeanschmidt , https://github.com/Skylion007
Co-authored-by: Ting Lu <tingl@nvidia.com >
2025-09-05 07:47:54 +00:00
9632f4ea9f
[CD] [aarch64] Add CUDA 13.0 sbsa nightly build ( #161257 )
...
https://github.com/pytorch/pytorch/issues/159779
CUDA SBSA build for CUDA 13.0
1. Supported archs: sm_80 to sm_120. Including support for Thor (sm_110), SPARK (sm_121), GB300 (sm_103).
"This release adds support of SM110 GPUs for arm64-sbsa on Linux." from 13.0 release notes https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
2. Use -compress-mode=size for binary size reduction, 13.0 wheel is 2.18 GB, when compared with 12.9 3.28 GB, that is 1.1 GB of savings and ~33.5% smaller.
3. Refactored the libs_to_copy list with common libs, and version_specific_libs.
TODO: add the other CUDA archs in the existing support matrix of x86 to SBSA build as well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161257
Approved by: https://github.com/nWEIdia , https://github.com/atalman
2025-08-27 14:38:07 +00:00
1a566c4909
Remove Python 3.9 nightly builds ( #161427 )
...
Please see https://github.com/pytorch/pytorch/issues/161167
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161427
Approved by: https://github.com/huydhn
2025-08-25 22:05:40 +00:00
49ff884b1e
Add CUDA 13.0 x86 builds ( #160956 )
...
https://github.com/pytorch/pytorch/issues/159779
CUDA 13.0.0
NVSHMEM 3.3.20
CUDNN 9.12.0.46
Adding x86 linux builds for CUDA 13.
Adding libtorch docker.
Package naming changed for CUDA 13 (removed postfix -cu13 for some packages).
Preparation checklist:
1. Update index https://download.pytorch.org/whl/nightly/cu130 with pypi packages
2. Update packaging name based on https://pypi.org/project/cuda-toolkit/ metadata
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160956
Approved by: https://github.com/atalman
Co-authored-by: atalman <atalman@fb.com >
2025-08-22 11:31:09 +00:00
7bd4cfaef4
[BE] Update nvshem dependency to 3.3.20 ( #160458 )
...
Which is manylinux2_28 compatible, even on aarch64 platform
archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works.
Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel
Should fix https://github.com/pytorch/pytorch/issues/160425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160458
Approved by: https://github.com/Skylion007 , https://github.com/kwen2501 , https://github.com/nWEIdia , https://github.com/atalman , https://github.com/tinglvv
2025-08-16 02:00:57 +00:00
c015e53d37
Revert "[BE] Update nvshem dependency to 3.3.20 ( #160458 )"
...
This reverts commit e0488d9f00865fb56c931580c80e099771c6285e.
Reverted https://github.com/pytorch/pytorch/pull/160458 on behalf of https://github.com/wdvr due to need to rerun workflow generation (failing workflow-checks) ([comment](https://github.com/pytorch/pytorch/pull/160458#issuecomment-3193133706 ))
2025-08-16 01:47:42 +00:00
e0488d9f00
[BE] Update nvshem dependency to 3.3.20 ( #160458 )
...
Which is manylinux2_28 compatible, even on aarch64 platform
archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works.
Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel
Should fix https://github.com/pytorch/pytorch/issues/160425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160458
Approved by: https://github.com/Skylion007 , https://github.com/kwen2501 , https://github.com/nWEIdia , https://github.com/atalman , https://github.com/tinglvv
2025-08-16 00:50:13 +00:00
16ce2c15fa
Add python 3.14 support to linux aarch64 builds ( #160788 )
...
Related to https://github.com/pytorch/pytorch/issues/156856
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160788
Approved by: https://github.com/malfet
2025-08-16 00:03:21 +00:00
d0226719a9
[BE][EZ] Delete remains of split-build logic ( #159990 )
...
Hopefully last piece of https://github.com/pytorch/pytorch/issues/138750
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159990
Approved by: https://github.com/atalman
ghstack dependencies: #159986
2025-08-07 01:59:30 +00:00
476874b37f
[BE]: Update NCCL to 2.27.5 ( #157108 )
...
Update NCCL to 2.27.5. Minor version, improves Blackwell, Symmem FP8 support, and fixes a bug with MNVVL.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157108
Approved by: https://github.com/atalman
2025-07-08 15:40:54 +00:00
a6fab82b16
[BE]: Fix NVSHMEM builds, add missing 12.9 dependency and update to latest for 2.8RC ( #157453 )
...
Fixed our bad builds of nvshmem, (we were not building or testing before) and also updates to the latest version. Newest versions has critical support for things that would actually make it useful, like bfloat16 and float16 support.
This is a proper fix for: https://github.com/pytorch/pytorch/pull/157411
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157453
Approved by: https://github.com/kwen2501 , https://github.com/atalman
2025-07-03 22:55:18 +00:00
a317c63d1b
[BE]: Update NCCL to 2.27.3 ( #155233 )
...
Fixes: https://github.com/pytorch/pytorch/issues/155052 and https://github.com/pytorch/pytorch/issues/153517
This upgrade is needed to effectively use those symmetric memory kernels anyway. Also fixes some nasty NCCL bugs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155233
Approved by: https://github.com/nWEIdia , https://github.com/kwen2501 , https://github.com/atalman , https://github.com/eqy
2025-06-14 19:20:31 +00:00
344731fb25
Add CUDA 12.9.1 sbsa nightly binaries ( #155819 )
...
https://github.com/pytorch/pytorch/issues/155196
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155819
Approved by: https://github.com/atalman
2025-06-13 18:52:41 +00:00
9cced33c7c
[BE]: Update cudnn to 9.10.2.21 ( #155576 )
...
Update to CUDNN 9.10.2.21
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155576
Approved by: https://github.com/eqy , https://github.com/atalman
2025-06-12 12:50:36 +00:00
f59c76b549
Revert "[BE]: Update cudnn to 9.10.2.21 ( #155576 )"
...
This reverts commit 2d3615f577894c7a117a55e85bb8371bb598ec50.
Reverted https://github.com/pytorch/pytorch/pull/155576 on behalf of https://github.com/malfet due to breaks the same test again (I remember there were a version that adjusted tolerances), see bc3972b80a/1
([comment](https://github.com/pytorch/pytorch/pull/155576#issuecomment-2964404710 ))
2025-06-11 22:03:45 +00:00
2d3615f577
[BE]: Update cudnn to 9.10.2.21 ( #155576 )
...
Update to CUDNN 9.10.2.21
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155576
Approved by: https://github.com/eqy , https://github.com/atalman
2025-06-11 20:32:07 +00:00
3863bbb55b
[BE]: Update cusparselt to 0.7.1 ( #155232 )
...
Needed to support sparse operations on Blackwell, and implements new features for the library. Also optimizes library sizes vs 0.7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155232
Approved by: https://github.com/nWEIdia , https://github.com/malfet
2025-06-09 18:01:23 +00:00
9656251bb1
Revert "[BE] Update cudnn to 9.10.1.4 ( #155122 )"
...
This reverts commit a14f427db68e54500ef4cd9ed34cb9537263bb74.
Reverted https://github.com/pytorch/pytorch/pull/155122 on behalf of https://github.com/malfet due to Looks like it breaks a bunch of tests, see 36a722e20d/1
([comment](https://github.com/pytorch/pytorch/pull/155122#issuecomment-2949209801 ))
2025-06-06 13:03:49 +00:00
a14f427db6
[BE] Update cudnn to 9.10.1.4 ( #155122 )
...
Follow up to #152782
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155122
Approved by: https://github.com/malfet , https://github.com/atalman
2025-06-05 16:07:25 +00:00
34c6371d24
Add NVSHMEM to PYTORCH_EXTRA_INSTALL_REQUIREMENTS ( #154568 )
...
NVSHMEM 3.2.5 (released Mar 2025) have both cu11 and cu12 builds.
See:
https://pypi.nvidia.com/nvidia-nvshmem-cu12/
https://pypi.nvidia.com/nvidia-nvshmem-cu11/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154568
Approved by: https://github.com/atalman
ghstack dependencies: #154538
2025-06-04 17:43:24 +00:00
bab59d3c28
Upgrade to CUDA 12.8.1 for nightly binaries ( #152923 )
...
Upgrade current CUDA 12.8 builds to 12.8.1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152923
Approved by: https://github.com/atalman
2025-05-23 22:37:05 +00:00
7f79222992
Upgrade to NCCL 2.26.5 for CUDA 12 ( #152810 )
...
Upgrade NCCL to latest 2.26.5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152810
Approved by: https://github.com/eqy , https://github.com/albanD , https://github.com/nWEIdia , https://github.com/atalman , https://github.com/cyyever
2025-05-14 00:52:50 +00:00
c869862875
Remove cuda dependencies from non cuda buids ( #152333 )
...
These dependancies added to fix poetry issue on pypi. However inclusion of these dependencies creates issue with poetry on download.pytorch.org due to poetry reading first available wheel on index for METADATA requirements. Hence all metadata requirements for CPU wheels can't list any cuda dependencies.
Injecting these dependencies via prep for pypi will need to be done via:
https://github.com/pytorch/test-infra/blob/main/release/pypi/prep_binary_for_pypi.sh
Ref: https://github.com/pytorch/pytorch/issues/152121
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152333
Approved by: https://github.com/jeanschmidt , https://github.com/malfet
2025-04-28 16:46:44 +00:00
e05ac9b794
Use folder tagged docker images for binary builds ( #151706 )
...
Should be the last part of https://github.com/pytorch/pytorch/pull/150558 , except for maybe s390x stuff, which I'm still not sure what's going on there
For binary builds, do the thing like we do in CI where we tag each image with a hash of the .ci/docker folder to ensure a docker image built from that commit gets used. Previously it would use imagename:arch-main, which could be a version of the image based on an older commit
After this, changing a docker image and then tagging with ciflow/binaries on the same PR should use the new docker images
Release and main builds should still pull from docker io
Cons:
* if someone rebuilds the image from main or a PR where the hash is the same (ex folder is unchanged, but retrigger docker build for some reason), the release would use that image instead of one built on the release branch
* spin wait for docker build to finish
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151706
Approved by: https://github.com/atalman
2025-04-22 21:50:10 +00:00
b238e36fd9
Revert "[BE][Ez]: Update CU126 to CUDNN 12.8 too ( #149254 )"
...
This reverts commit b0a5d55c584792a504ec18600180e3d1200dfea6.
Reverted https://github.com/pytorch/pytorch/pull/149254 on behalf of https://github.com/izaitsevfb due to seems to be causing multiple test failures ([comment](https://github.com/pytorch/pytorch/pull/149254#issuecomment-2744686862 ))
2025-03-21 23:44:09 +00:00
b0a5d55c58
[BE][Ez]: Update CU126 to CUDNN 12.8 too ( #149254 )
...
Have CUDNN have the same version for 12.6 and 12.8 for better performance and consistency. We can't do CU12.1 because it's not supported and CU12.4 isn't updated due to manywheel Linux compatibility reasons and dropping support for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149254
Approved by: https://github.com/jansel , https://github.com/atalman , https://github.com/tinglvv
2025-03-21 18:20:44 +00:00
1d9401befc
ci: Remove mentions and usages of DESIRED_DEVTOOLSET and cxx11 ( #149443 )
...
This is a remnant of our migration to manylinux2_28 we should remove
these since all of our binary builds are now built with cxx11_abi
Signed-off-by: Eli Uriegas <eliuriegas@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149443
Approved by: https://github.com/izaitsevfb , https://github.com/atalman
2025-03-20 16:49:46 +00:00
826e790696
Revert "ci: Remove mentions and usages of DESIRED_DEVTOOLSET ( #149443 )"
...
This reverts commit 95a633c45304755ebdbc08396d9948d34243ddb3.
Reverted https://github.com/pytorch/pytorch/pull/149443 on behalf of https://github.com/izaitsevfb due to fails lint ([comment](https://github.com/pytorch/pytorch/pull/149443#issuecomment-2738709561 ))
2025-03-20 00:59:41 +00:00
95a633c453
ci: Remove mentions and usages of DESIRED_DEVTOOLSET ( #149443 )
...
This is a remnant of our migration to manylinux2_28 we should remove
these since all of our binary builds are now built with cxx11_abi
Signed-off-by: Eli Uriegas <eliuriegas@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149443
Approved by: https://github.com/izaitsevfb , https://github.com/atalman
2025-03-20 00:39:02 +00:00
dea7157160
nccl: upgrade to 2.26.2 to avoid hang on ncclCommAbort ( #149351 )
...
Fixes #149153
Yaml generated from:
```
python .github/scripts/generate_ci_workflows.py
```
Test plan:
Repro in https://gist.github.com/d4l3k/16a19b475952bc40ddd7f2febcc297b7
```
rm -rf third_party/nccl
python setup.py develop
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149351
Approved by: https://github.com/kwen2501 , https://github.com/atalman , https://github.com/malfet
2025-03-18 05:23:18 +00:00
6856d81c60
[BE]: Update CU128 cudnn to 9.8.0.87 ( #148963 )
...
Also cu12.6 is an on old CUDNN version, we may want to upgrade it for all the performance reasons as I don't see a manywheel linux reason to stay back on the old 9.5 release. I might split that into it's own PR. This one just updates CU126 to the latest and greatest.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148963
Approved by: https://github.com/jansel , https://github.com/eqy , https://github.com/nWEIdia , https://github.com/tinglvv , https://github.com/atalman
2025-03-13 16:59:12 +00:00
2a1eeaeed8
Remove 12.4 x86 builds and 12.6 sbsa builds from nightly ( #148895 )
...
https://github.com/pytorch/pytorch/issues/145570
redo https://github.com/pytorch/pytorch/pull/148625
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148895
Approved by: https://github.com/atalman
2025-03-10 20:55:09 +00:00
a81751d8b7
[CD] Annotate linux/arm64 cuda wheels with consistent nvidia dependencies ( #145021 )
...
This resolves issues installing torch nightly wheels into a `uv sync`-generated `.venv`
The root cause is that the x64 and arm64 cuda nightly wheels have inconsistent metadata. This can be seen comparing `generated-linux-aarch64-binary-manywheel-nightly.yml` and `generated-linux-binary-manywheel-nightly.yml`
`uv` expects consistency:
https://github.com/astral-sh/uv/issues/10693
>Frankly, it's really not ideal that they change their dependencies from wheel to wheel.
>They could still put the dependencies there with the same platform markers they're using in the other wheel though... 🤷 ♀
https://github.com/astral-sh/uv/issues/10119#issuecomment-2559898792
>I think this is something that basically has to be solved by PyTorch. The issue is that the wheels for `2.6.0.dev20241222+cu126` don't have consistent metadata, and it's a fundamental assumption of uv that the metadata for a given version _is_ consistent.
To resolve this, I modified the arm64 nightly build workflow to add two new `PYTORCH_EXTRA_INSTALL_REQUIREMENTS` entries, under `manywheel-py3_11-cuda-aarch64-build` and `manywheel-py3_12-cuda-aarch64-build`. These are based on their equivalents in the x64 workflow for the corresponding python versions.
I used the cuda 12.6 dependencies versions for the nvidia packages, to match the `DOCKER_IMAGE: pytorch/manylinuxaarch64-builder:cuda12.6-main` being used by these jobs.
(The arm64 workflow file already had several `PYTORCH_EXTRA_INSTALL_REQUIREMENTS` entries, under various cpu wheels. I'm not sure why these are there, but I left them as-is.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145021
Approved by: https://github.com/seemethere , https://github.com/atalman
Co-authored-by: Eli Uriegas <eliuriegas@meta.com >
Co-authored-by: Andrey Talman <atalman@fb.com >
2025-03-10 14:39:39 +00:00
99da439d10
Revert "Remove Cuda 12.4 from nightly Binaries ( #148625 )"
...
This reverts commit 1239176fe717839ca5612ac03a4806051225f381.
Reverted https://github.com/pytorch/pytorch/pull/148625 on behalf of https://github.com/malfet due to Broke lint ([comment](https://github.com/pytorch/pytorch/pull/148625#issuecomment-2707415005 ))
2025-03-07 20:47:45 +00:00
1239176fe7
Remove Cuda 12.4 from nightly Binaries ( #148625 )
...
https://github.com/pytorch/pytorch/issues/145570
removes cuda 12.4 nightly builds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148625
Approved by: https://github.com/atalman
2025-03-07 18:56:04 +00:00
4ece056791
Nccl update to 2.25.1 for cuda 12.4-12.8 ( #146073 )
...
Should resolve: https://github.com/pytorch/pytorch/issues/144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146073
Approved by: https://github.com/Skylion007 , https://github.com/malfet , https://github.com/kwen2501 , https://github.com/fduwjj
2025-02-19 03:52:26 +00:00
7622e29a37
Revert "Nccl update to 2.25.1 for cuda 12.4-12.8 ( #146073 )"
...
This reverts commit eecee5863e698d19458b33df7bfecbda0a04557a.
Reverted https://github.com/pytorch/pytorch/pull/146073 on behalf of https://github.com/atalman due to breaks Locally building benchmarks ([comment](https://github.com/pytorch/pytorch/pull/146073#issuecomment-2667054179 ))
2025-02-18 22:23:35 +00:00
eecee5863e
Nccl update to 2.25.1 for cuda 12.4-12.8 ( #146073 )
...
Should resolve: https://github.com/pytorch/pytorch/issues/144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146073
Approved by: https://github.com/Skylion007 , https://github.com/malfet , https://github.com/kwen2501 , https://github.com/fduwjj
2025-02-14 21:23:19 +00:00
e06ee4aa9f
Revert "Nccl update to 2.25.1 for cuda 12.4-12.8 ( #146073 )"
...
This reverts commit 06f4a5c0e578d7da10ebdf14edcd24e5dcef78d6.
Reverted https://github.com/pytorch/pytorch/pull/146073 on behalf of https://github.com/atalman due to breaks macos builds: ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package ([comment](https://github.com/pytorch/pytorch/pull/146073#issuecomment-2659802389 ))
2025-02-14 16:44:46 +00:00
06f4a5c0e5
Nccl update to 2.25.1 for cuda 12.4-12.8 ( #146073 )
...
Should resolve: https://github.com/pytorch/pytorch/issues/144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146073
Approved by: https://github.com/Skylion007 , https://github.com/malfet , https://github.com/kwen2501 , https://github.com/fduwjj
2025-02-14 15:29:59 +00:00