52 Commits

Author SHA1 Message Date
4ece056791 Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073)
Should resolve: https://github.com/pytorch/pytorch/issues/144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
2025-02-19 03:52:26 +00:00
7622e29a37 Revert "Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073)"
This reverts commit eecee5863e698d19458b33df7bfecbda0a04557a.

Reverted https://github.com/pytorch/pytorch/pull/146073 on behalf of https://github.com/atalman due to breaks Locally building benchmarks ([comment](https://github.com/pytorch/pytorch/pull/146073#issuecomment-2667054179))
2025-02-18 22:23:35 +00:00
eecee5863e Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073)
Should resolve: https://github.com/pytorch/pytorch/issues/144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
2025-02-14 21:23:19 +00:00
e06ee4aa9f Revert "Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073)"
This reverts commit 06f4a5c0e578d7da10ebdf14edcd24e5dcef78d6.

Reverted https://github.com/pytorch/pytorch/pull/146073 on behalf of https://github.com/atalman due to breaks macos builds: ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package ([comment](https://github.com/pytorch/pytorch/pull/146073#issuecomment-2659802389))
2025-02-14 16:44:46 +00:00
06f4a5c0e5 Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073)
Should resolve: https://github.com/pytorch/pytorch/issues/144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
2025-02-14 15:29:59 +00:00
df5e232563 [BE] Delete NCCL slimming (#146943)
It was added by https://github.com/pytorch/pytorch/pull/35843 and served its purpose when everything was linked statically in libtorch_cuda.so, but for all our releases it's no longer relevant as nccl is now a dynamic dependency of libtorch_cuda.so

Besides,  It does not work with CXX11 ABI anyway, and creates problems with newer version of NCCL, when two `collectvies.o` are package into library archive.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146943
Approved by: https://github.com/Skylion007, https://github.com/atalman
2025-02-12 00:35:55 +00:00
93f2a64d4d Update submodule NCCL to v2.18.3 (#104993)
Update NCCL submodule to v2.18.3 which fixes numerous bugs and performance issues, particularly on newer GPUs: https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-18-3.html#rel_2-18-3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104993
Approved by: https://github.com/malfet
2023-08-18 23:43:01 +00:00
ee2ce3fef6 Set make max load when building libtorch (#89237)
The nccl build is still OOM sometimes when using `$(MAKE)`:

```
virtual memory exhausted: Cannot allocate memory
Makefile:73: recipe for target '/var/lib/jenkins/cpp-build/caffe2/build/nccl/obj/collectives/device/devlink.o' failed
make[5]: *** [/var/lib/jenkins/cpp-build/caffe2/build/nccl/obj/collectives/device/devlink.o] Error 1
make[5]: Leaving directory '/var/lib/jenkins/workspace/third_party/nccl/nccl/src/collectives/device'
```

* https://github.com/pytorch/pytorch/actions/runs/3476485191/jobs/5811758058
* https://github.com/pytorch/pytorch/actions/runs/3422228421/jobs/5702153639

So trying to set the same limit here as when building with ninja

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89237
Approved by: https://github.com/malfet
2022-11-18 18:55:33 +00:00
9a81da7ad1 Update NCCL to current master and remove patch step (#85367)
The patch from #84245 has been upstreamed into NCCL, so the patch step is no longer required.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85367
Approved by: https://github.com/ezyang
2022-09-21 19:23:49 +00:00
fa86874bbd Fix intermittent link errors in NCCL build (#84245)
Should fix #13362 and fix #83790

I think I've discovered the root cause of the intermittent nccl link
failures. If we look at the variable name in the redefinition error:
```
_02021d91_11_sendrecv_cu_0bc7b9c8_11152
```

this is the name of the file being compiled + some form of unique ID.
As part of NCCL's build process, the same file is compiled multiple
times with different macro definitions depending on which operator and
dtype are being compiled, e.g.
```
nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -dc sendrecv.cu -o sendrecv_sum_i8.o
```

Since the filename parts are the same, then if the unique IDs also
happen to collide then the entire identifier will collide and the link
fails. So the fix here is to generate a unique `.cu` file for each
object file. I've implemented this as a `.patch` file that gets
applied from our cmake code, but if we instead fork nccl that would be
cleaner.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84245
Approved by: https://github.com/janeyx99, https://github.com/malfet
2022-09-13 19:55:52 +00:00
56a37ea1a6 Set default value for nccl make MAX_JOBS if ProcessorCount returns 0 (#84231)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84231
Approved by: https://github.com/malfet, https://github.com/rohan-varma
2022-08-30 16:06:34 +00:00
2000eba454 NCCL: Re-enable parallel builds (#83696)
Since #83173 was merged I have noticed some CI being slowed down by
the nccl building step. e.g. if there are no C++ changes then sccache
compiles everything else very quickly and nccl becomes the limiting
factor.

This re-enables parallel builds with some safeguards to protect
against oversubscription. When `make` is the parent build system, we
can use `$(MAKE)` and the `make` jobserver will coordinate job
allocation with the sub-process. For other build systems, this calls
`make` with the `-l` flag which should prevent it launching jobs when
the system load average is already too high.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83696
Approved by: https://github.com/malfet
2022-08-25 05:16:01 +00:00
37d3db7579 Deletes CCACHE_DISABLE and SCCACHE_DISABLE from nccl.cmake (#84007)
Looking through the code and online, it does not look like these variables actually change anything. Regardless, this change was instituted to fix https://github.com/pytorch/pytorch/issues/13362, but we are again running into similar issues even with the workaround: see https://github.com/pytorch/pytorch/issues/83790.

Thus, since
1. this change isn't preventing flakiness
2. these variables do not seem used anywhere in pytorch/pytorch nor mozilla/sccache
we should remove this confusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84007
Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/ZainRizvi
2022-08-24 21:43:12 +00:00
3a9ae518f2 Skip NCCL slimming for cxx11 libtorch builds (#83959)
Fixes https://github.com/pytorch/pytorch/issues/83887

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83959
Approved by: https://github.com/atalman
2022-08-24 18:31:27 +00:00
1c83ec8f61 Build nccl single-threaded (#83173)
Closes #82888

This is a tentative fix. make is called by ninja so should be run in
parallel with other jobs already.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83173
Approved by: https://github.com/malfet
2022-08-10 21:40:46 +00:00
c08092fdf2 Update NCCL to v2.13.4-1 (#82775)
Also, update slimming script to include two instances of net.o that new library generates
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82775
Approved by: https://github.com/ngimel
2022-08-04 19:36:45 +00:00
7c298b8244 Fix objcopy version detection (#82774)
By extending regex to match any character other than not just version

On Ubuntu version string looks as follows:
```
$ objcopy --version
GNU objcopy (GNU Binutils for Ubuntu) 2.30
```
And on some CentOSes it looks as
```
$ objcopy --version
GNU objcopy (GNU Binutils) 2.37

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82774
Approved by: https://github.com/ngimel
2022-08-04 16:26:31 +00:00
2eef1f27f8 Disable ccache for nccl builds (#62208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62208

reverts
https://github.com/pytorch/pytorch/pull/55814
which removed a workaround for:
https://github.com/pytorch/pytorch/issues/13362

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29935472

Pulled By: nairbv

fbshipit-source-id: 7ce9cde1408f17153632036fd128814032739746
2021-07-27 08:07:26 -07:00
b98f011cd4 cmake: Enable (s)ccache for nccl builds (#55814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55814

I don't really know if the original issue is resolved but let's just
check and see if this passes CI so that we can potentially get some
speed up on our builds

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D27715734

Pulled By: seemethere

fbshipit-source-id: a8f90774dfd25b0abf8e57283fe3591a8d8f3c4b
2021-04-13 14:49:25 -07:00
8a574c7104 [Cmake] Drop quotation marks around $ENV{MAX_JOBS} (#44557)
Summary:
Solves `the '-j' option requires a positive integer argument` error on some systems when MAX_JOBS is not defined

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44557

Reviewed By: vkuzo

Differential Revision: D23653511

Pulled By: malfet

fbshipit-source-id: 7d86fb7fb6c946c34afdc81bf2c3168a74d00a1f
2020-09-11 12:57:11 -07:00
4d431881d1 Control NCCL build parallelism via MAX_JOBS environment var (#44167)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44167

Reviewed By: walterddr, ngimel

Differential Revision: D23522419

Pulled By: malfet

fbshipit-source-id: 31b25a71fef3e470bdf382eb3698e267326fa354
2020-09-04 10:02:53 -07:00
cf7e7909d5 NCCL must depend on librt (#41978)
Summary:
Since NCCL makes calls to shm_open/shm_close it must depend on librt on Linux

This should fix `DSO missing from command line` error on some platforms

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41978

Reviewed By: colesbury

Differential Revision: D22721430

Pulled By: malfet

fbshipit-source-id: d2ae08ce9da3979daaae599e677d5e4519b080f0
2020-07-24 16:47:19 -07:00
6a45584272 Remove __nv_relfatbin section from nccl_static library (#35843)
Summary:
NCCL library is built using [CUDA separate compilation](https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/), which consists of building intermediate CUDA binaries and then linking them into GPU code that could be executed on device. Intermediate CUDA code is stored in `__nv_relfatbin` section, and code that can be launched is stored in `.nv_fatbin`. When `nvcc` is used to link executable/shared library, it removes those intermediate binaries, but default host linker is not aware of that and therefore it is kept inside host executable.  Help compiler by removing `__nv_relfatbin` sections from object file inside `libncc_static.a`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35843

Test Plan: Build pytorch with CUDA and run `test_distributed.py`

Differential Revision: D20882224

Pulled By: malfet

fbshipit-source-id: f23dd4aa416518324cb38b9bd6846e73a1c7dd21
2020-04-06 18:23:08 -07:00
45c9ed825a Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521)
Summary:
Running commands:
```bash
shopt -s globstar

sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake.in
```
We may further convert all the commands into lowercase according to the following issue: 77543bde41.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521

Differential Revision: D20704382

Pulled By: malfet

fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80
2020-03-27 14:25:17 -07:00
60c46dd4df Let CMake handle NCCL detection instead of our handcrafted Python script. (#22930)
Summary:
 ---

How does the current code subsume all detections in the deleted `nccl.py`?

- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.

- The main NCCL detection happens in [FindNCCL.cmake](8377d4b32c/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](8377d4b32c/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.

- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.

- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
  * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA.  `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.

  * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
     - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
     - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).

 ---

Regarding for relevant issues:

- https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment.
- https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f81ef61d19b884194cdbcd6c7089636d46 A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:

> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930

Differential Revision: D16440275

Pulled By: ezyang

fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
2019-07-23 08:45:51 -07:00
798d5d9771 Revert D16281714: Add sanity checks for NCCL detection.
Differential Revision:
D16281714

Original commit changeset: 396bcbf099bd

fbshipit-source-id: a22cc112d1b6a62d689f9d8a7f93e8be3abe2a44
2019-07-16 13:58:27 -07:00
e2046f8c1d Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16281714

Pulled By: ezyang

fbshipit-source-id: 396bcbf099bd07b996cf779c6b43092096b52d90
2019-07-16 11:32:32 -07:00
8711df89cc fix nccl compilation to make sure it compiles for architectures that pytorch compiles for (#18739)
Summary:
resubmit of https://github.com/pytorch/pytorch/pull/18704 with additional fixes

Fixes https://github.com/pytorch/pytorch/issues/18359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18739

Differential Revision: D14737274

Pulled By: soumith

fbshipit-source-id: cfbbbf68b098594bd045861d1b2c085da693ea51
2019-04-03 12:52:50 -07:00
a799751e33 Revert D14717015: [pytorch][PR] fix nccl compilation to make sure it compiles for architectures that pytorch compiles for
Differential Revision:
D14717015

Original commit changeset: 4aac036f57e5

fbshipit-source-id: c820b8dfb27564271e6b80e133fe655658a7c25c
2019-04-02 09:39:03 -07:00
fc6296d777 fix nccl compilation to make sure it compiles for architectures that pytorch compiles for (#18704)
Summary:
cc: t-vi gchanan zou3519

This fixes https://github.com/pytorch/pytorch/issues/18359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18704

Differential Revision: D14717015

Pulled By: soumith

fbshipit-source-id: 4aac036f57e564b05d759662e8ad7a80170901c0
2019-04-01 17:10:42 -07:00
37627a182b fix USE_SYSTEM_NCCL build (#14606)
Summary:
fixes https://github.com/pytorch/pytorch/issues/14537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14606

Differential Revision: D13274156

Pulled By: soumith

fbshipit-source-id: f834715e8e17dacf60be459b0efffba1d4df40ae
2018-11-29 23:36:17 -08:00
fb7e40b7eb nccl fixes (#14195)
Summary:
This has 4 changes

1) propagate USE_SYSTEM_NCCL. Previously it was ignored and cmake always did a FindPackage
2) respect SCCACHE_DISABLE in our caffe2 sccache wrapper for circleci
3) use SCCACHE_DISABLE when building nccl, because it triggers the same bug as when using CCACHE (already tracked in https://github.com/pytorch/pytorch/issues/13362). This was hidden because we weren't respecting USE_SYSTEM_NCCL, and were never building nccl ourselves in CI
4) In one particular CI configuration (caffe2, cuda 8, cudnn 7), force USE_SYSTEM_NCCL=1. Building the bundled nccl triggers a bug in nvlink. I've done some investigation, but this looks like a tricky, preexisting bug, so rather than hold up this diff I'm tracking it separately in https://github.com/pytorch/pytorch/issues/14486
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14195

Differential Revision: D13237502

Pulled By: anderspapitto

fbshipit-source-id: 1100ac1269c7cd39e2e0b3ba12a56a3ce8977c55
2018-11-28 14:43:06 -08:00
44d2ca660a Disable CCACHE while building NCCL (#13340)
Summary:
I don't have a full analysis, but ccache appears to often fail while
nccl. To work around this, run the NCCL build with CCACHE_DISABLE.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13340

Differential Revision: D12855467

Pulled By: anderspapitto

fbshipit-source-id: 63eb12183ab9d03dd22090f084688ae6390fe8bd
2018-10-30 22:19:21 -07:00
c68b82ebc8 don't expand cmake variable in IF
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13331

Differential Revision: D12849306

Pulled By: anderspapitto

fbshipit-source-id: 2f1f72a44ed3a176be8c7490652e49771c3fadbf
2018-10-30 15:20:43 -07:00
380d2dfb27 absorb nccl (#13150)
Summary:
always build nccl from within the main cmake build, rather than via a separate invocation in build_pytorch_libs.sh. Use the existing caffe2 codepaths
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13150

Differential Revision: D12815674

Pulled By: anderspapitto

fbshipit-source-id: a710b6f242d159b9816911a25ee2c4b8c3f855aa
2018-10-29 12:04:32 -07:00
c5d7494ca1 Use open-source NCCL2 in PyTorch (#12359)
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself

NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12359

Reviewed By: orionr, yns88

Differential Revision: D10219665

Pulled By: teng-li

fbshipit-source-id: 134ff47057512ba617b48bf390c1c816fff3f881
2018-10-08 15:39:07 -07:00
895994a7c3 Back out "[pytorch][PR] [Build] Use open-source NCCL2 in PyTorch"
Reviewed By: The controller you requested could not be found.

fbshipit-source-id: a13075339d3a7b970e81be0b1a32a7c4c3a6c68d
2018-10-04 14:12:04 -07:00
ae7a7fb398 Use open-source NCCL2 in PyTorch (#12312)
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself

NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12312

Differential Revision: D10190845

Pulled By: teng-li

fbshipit-source-id: 08d42253b774149a66919d194f88b34628c39bae
2018-10-04 11:42:17 -07:00
4bf0202cac [build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399)
* Have PyTorch depend on minimal libcaffe2.so instead of libATen.so

* Build ATen tests as a part of Caffe2 build

* Hopefully cufft and nvcc fPIC fixes

* Make ATen install components optional

* Add tests back for ATen and fix TH build

* Fixes for test_install.sh script

* Fixes for cpp_build/build_all.sh

* Fixes for aten/tools/run_tests.sh

* Switch ATen cmake calls to USE_CUDA instead of NO_CUDA

* Attempt at fix for aten/tools/run_tests.sh

* Fix typo in last commit

* Fix valgrind call after pushd

* Be forgiving about USE_CUDA disable like PyTorch

* More fixes on the install side

* Link all libcaffe2 during test run

* Make cuDNN optional for ATen right now

* Potential fix for non-CUDA builds

* Use NCCL_ROOT_DIR environment variable

* Pass -fPIC through nvcc to base compiler/linker

* Remove THCUNN.h requirement for libtorch gen

* Add Mac test for -Wmaybe-uninitialized

* Potential Windows and Mac fixes

* Move MSVC target props to shared function

* Disable cpp_build/libtorch tests on Mac

* Disable sleef for Windows builds

* Move protos under BUILD_CAFFE2

* Remove space from linker flags passed with -Wl

* Remove ATen from Caffe2 dep libs since directly included

* Potential Windows fixes

* Preserve options while sleef builds

* Force BUILD_SHARED_LIBS flag for Caffe2 builds

* Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing

* Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake

* Fixes for the last two changes

* Potential fix for Mac build failure

* Switch Caffe2 to build_caffe2 dir to not conflict

* Cleanup FindMKL.cmake

* Another attempt at Mac cpp_build fix

* Clear cpp-build directory for Mac builds

* Disable test in Mac build/test to match cmake
2018-05-24 07:47:27 -07:00
c9cc514df4 Bump minimum CMake version to 3.2
CMake 3.2 is required to properly track dependencies in projects imported as ExternalProject_Add (BUILD_BYPRODUCTS parameter).
Users on Ubuntu 14.04 LTS would need to install and use cmake3 package for configurations. Users of other popular distributions generally have a recent enough CMake package.
2018-03-06 19:57:48 -08:00
80430501c9 Remove the use of EXTERNAL_DEPENDENCIES (#2045)
* [cmake] Move nccl to modern cmake, and avoid using EXTERNAL_DEPENDENCIES

* [cmake] Move nnpack to modern cmake and avoid using EXTERNAL_DEPENDENCIES.

* [cmake] Move ATen to modern cmake and avoid using EXTERNAL_DEPENDENCIES.

* Move cpufeatures to modern cmake, and avoid using EXTERNAL_DEPENDENCIES

* Finally remove EXTERNAL_DEPENDENCIES.

* Maratyszcza's comments
2018-02-24 16:15:28 -08:00
bd22b83d62 Fix nccl cmake files
Summary: Closes https://github.com/caffe2/caffe2/pull/1963

Differential Revision: D6994392

Pulled By: bddppq

fbshipit-source-id: 4ab6a8f7dcb4469bdd3e152559ff3474984776fc
2018-02-14 16:04:11 -08:00
2c10b13eeb Pass CUDA_NVCC_EXECUTABLE to NCCL build
Summary:
If this variable is set to a ccache symlink then the NCCL build will
also use the cache. The NCCL build is the slowest component of a cached
build without this change
Closes https://github.com/caffe2/caffe2/pull/1416

Reviewed By: Yangqing

Differential Revision: D6214008

Pulled By: pietern

fbshipit-source-id: e0a90e27de9b1c5a1fdc0e5bad5fb61f9fa924c3
2017-11-01 15:32:22 -07:00
45e6e71198 Tidy up CMake for NCCL
Summary:
Use HINTS instead of PATHS for find_library so that you can specify
-DNCCL_ROOT_DIR and it will use this NCCL installation regardless of
what else is installed on your system. Also add a path hint to include
the default base path for NCCL 2 libraries.
Closes https://github.com/caffe2/caffe2/pull/1152

Reviewed By: Yangqing

Differential Revision: D5740053

Pulled By: pietern

fbshipit-source-id: 43f0908a63e8a9b90320dece0bbb558827433b48
2017-08-30 15:39:56 -07:00
49c89d6664 Use add_dependencies() for ExternalProjects
Summary:
I closed https://github.com/caffe2/caffe2/pull/736 because one of these variables should be used after all.

Here's how C1 uses this variable: https://github.com/BVLC/caffe/blob/rc5/cmake/Targets.cmake#L116

Without this fix, there is a race condition in the parallel build leading to this error:
```
make[2]: *** No rule to make target `../third_party/NNPACK/lib/libnnpack.a', needed by `caffe2/libCaffe2_CPU.so'.
```
Closes https://github.com/caffe2/caffe2/pull/737

Differential Revision: D5211794

Pulled By: Yangqing

fbshipit-source-id: 9e368f09b01edaf86252727adc6f6cc40d244e29
2017-06-08 13:34:42 -07:00
6490d58a75 Add cuda path to nccl build
Summary:
This was found necessary on some CentOS. aaronmarkham
Closes https://github.com/caffe2/caffe2/pull/240

Differential Revision: D4819591

Pulled By: Yangqing

fbshipit-source-id: 40161cd484a2c8d43f26077919ad2762440dde13
2017-04-03 10:05:26 -07:00
d5880b128e CMake support for Gloo dependency
Summary:
This also requires a change to cmake/External/nccl.cmake to use the
static NCCL binary instead of the shared object. When the Caffe2/Gloo
build uses the bundled NCCL version it should be packaged up in the
resulting libraries and not cause another runtime dependency on a
library that has to be installed separately.
Closes https://github.com/caffe2/caffe2/pull/218

Differential Revision: D4769926

Pulled By: pietern

fbshipit-source-id: 5c85559992c200d874f4218724823815ffb5adb5
2017-03-24 08:32:24 -07:00
f449af378d Explicitly pass CXX to NCCL Makefile
Summary:
Necessary if CXX isn't set when cmake is called. The CXX variable will then be
empty which prevents make from using its own default.
Closes https://github.com/caffe2/caffe2/pull/202

Differential Revision: D4711113

Pulled By: Yangqing

fbshipit-source-id: 895c07044b263ba9b5440453978248506d7ac225
2017-03-14 18:33:36 -07:00
59d263280e fix directory reference in cmake for inclusion as library
Summary: This fixes build that include caffe2 and change the value of CMAKE_BINARY_DIR to their own binary dir.  Allows the generation of protobuf headers/files in particular.

Reviewed By: Yangqing

Differential Revision: D4466126

fbshipit-source-id: eba264094dd2bff07a7f050b95fd2d5525462b09
2017-01-26 14:44:37 -08:00
324ef09e01 fix typo 2017-01-03 23:16:36 -08:00