Should fix#13362 and fix#83790
I think I've discovered the root cause of the intermittent nccl link
failures. If we look at the variable name in the redefinition error:
```
_02021d91_11_sendrecv_cu_0bc7b9c8_11152
```
this is the name of the file being compiled + some form of unique ID.
As part of NCCL's build process, the same file is compiled multiple
times with different macro definitions depending on which operator and
dtype are being compiled, e.g.
```
nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -dc sendrecv.cu -o sendrecv_sum_i8.o
```
Since the filename parts are the same, then if the unique IDs also
happen to collide then the entire identifier will collide and the link
fails. So the fix here is to generate a unique `.cu` file for each
object file. I've implemented this as a `.patch` file that gets
applied from our cmake code, but if we instead fork nccl that would be
cleaner.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84245
Approved by: https://github.com/janeyx99, https://github.com/malfet
Since #83173 was merged I have noticed some CI being slowed down by
the nccl building step. e.g. if there are no C++ changes then sccache
compiles everything else very quickly and nccl becomes the limiting
factor.
This re-enables parallel builds with some safeguards to protect
against oversubscription. When `make` is the parent build system, we
can use `$(MAKE)` and the `make` jobserver will coordinate job
allocation with the sub-process. For other build systems, this calls
`make` with the `-l` flag which should prevent it launching jobs when
the system load average is already too high.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83696
Approved by: https://github.com/malfet
By extending regex to match any character other than not just version
On Ubuntu version string looks as follows:
```
$ objcopy --version
GNU objcopy (GNU Binutils for Ubuntu) 2.30
```
And on some CentOSes it looks as
```
$ objcopy --version
GNU objcopy (GNU Binutils) 2.37
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82774
Approved by: https://github.com/ngimel
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55814
I don't really know if the original issue is resolved but let's just
check and see if this passes CI so that we can potentially get some
speed up on our builds
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: walterddr
Differential Revision: D27715734
Pulled By: seemethere
fbshipit-source-id: a8f90774dfd25b0abf8e57283fe3591a8d8f3c4b
Summary:
Solves `the '-j' option requires a positive integer argument` error on some systems when MAX_JOBS is not defined
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44557
Reviewed By: vkuzo
Differential Revision: D23653511
Pulled By: malfet
fbshipit-source-id: 7d86fb7fb6c946c34afdc81bf2c3168a74d00a1f
Summary:
Since NCCL makes calls to shm_open/shm_close it must depend on librt on Linux
This should fix `DSO missing from command line` error on some platforms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41978
Reviewed By: colesbury
Differential Revision: D22721430
Pulled By: malfet
fbshipit-source-id: d2ae08ce9da3979daaae599e677d5e4519b080f0
Summary:
NCCL library is built using [CUDA separate compilation](https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/), which consists of building intermediate CUDA binaries and then linking them into GPU code that could be executed on device. Intermediate CUDA code is stored in `__nv_relfatbin` section, and code that can be launched is stored in `.nv_fatbin`. When `nvcc` is used to link executable/shared library, it removes those intermediate binaries, but default host linker is not aware of that and therefore it is kept inside host executable. Help compiler by removing `__nv_relfatbin` sections from object file inside `libncc_static.a`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35843
Test Plan: Build pytorch with CUDA and run `test_distributed.py`
Differential Revision: D20882224
Pulled By: malfet
fbshipit-source-id: f23dd4aa416518324cb38b9bd6846e73a1c7dd21
Summary:
---
How does the current code subsume all detections in the deleted `nccl.py`?
- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.
- The main NCCL detection happens in [FindNCCL.cmake](8377d4b32c/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](8377d4b32c/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.
- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.
- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
* The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA. `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.
* Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
- It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
- It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).
---
Regarding for relevant issues:
- https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment.
- https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f81ef61d19b884194cdbcd6c7089636d46 A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:
> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930
Differential Revision: D16440275
Pulled By: ezyang
fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
Summary:
This has 4 changes
1) propagate USE_SYSTEM_NCCL. Previously it was ignored and cmake always did a FindPackage
2) respect SCCACHE_DISABLE in our caffe2 sccache wrapper for circleci
3) use SCCACHE_DISABLE when building nccl, because it triggers the same bug as when using CCACHE (already tracked in https://github.com/pytorch/pytorch/issues/13362). This was hidden because we weren't respecting USE_SYSTEM_NCCL, and were never building nccl ourselves in CI
4) In one particular CI configuration (caffe2, cuda 8, cudnn 7), force USE_SYSTEM_NCCL=1. Building the bundled nccl triggers a bug in nvlink. I've done some investigation, but this looks like a tricky, preexisting bug, so rather than hold up this diff I'm tracking it separately in https://github.com/pytorch/pytorch/issues/14486
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14195
Differential Revision: D13237502
Pulled By: anderspapitto
fbshipit-source-id: 1100ac1269c7cd39e2e0b3ba12a56a3ce8977c55
Summary:
I don't have a full analysis, but ccache appears to often fail while
nccl. To work around this, run the NCCL build with CCACHE_DISABLE.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13340
Differential Revision: D12855467
Pulled By: anderspapitto
fbshipit-source-id: 63eb12183ab9d03dd22090f084688ae6390fe8bd
Summary:
always build nccl from within the main cmake build, rather than via a separate invocation in build_pytorch_libs.sh. Use the existing caffe2 codepaths
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13150
Differential Revision: D12815674
Pulled By: anderspapitto
fbshipit-source-id: a710b6f242d159b9816911a25ee2c4b8c3f855aa
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself
NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12359
Reviewed By: orionr, yns88
Differential Revision: D10219665
Pulled By: teng-li
fbshipit-source-id: 134ff47057512ba617b48bf390c1c816fff3f881
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself
NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12312
Differential Revision: D10190845
Pulled By: teng-li
fbshipit-source-id: 08d42253b774149a66919d194f88b34628c39bae
* Have PyTorch depend on minimal libcaffe2.so instead of libATen.so
* Build ATen tests as a part of Caffe2 build
* Hopefully cufft and nvcc fPIC fixes
* Make ATen install components optional
* Add tests back for ATen and fix TH build
* Fixes for test_install.sh script
* Fixes for cpp_build/build_all.sh
* Fixes for aten/tools/run_tests.sh
* Switch ATen cmake calls to USE_CUDA instead of NO_CUDA
* Attempt at fix for aten/tools/run_tests.sh
* Fix typo in last commit
* Fix valgrind call after pushd
* Be forgiving about USE_CUDA disable like PyTorch
* More fixes on the install side
* Link all libcaffe2 during test run
* Make cuDNN optional for ATen right now
* Potential fix for non-CUDA builds
* Use NCCL_ROOT_DIR environment variable
* Pass -fPIC through nvcc to base compiler/linker
* Remove THCUNN.h requirement for libtorch gen
* Add Mac test for -Wmaybe-uninitialized
* Potential Windows and Mac fixes
* Move MSVC target props to shared function
* Disable cpp_build/libtorch tests on Mac
* Disable sleef for Windows builds
* Move protos under BUILD_CAFFE2
* Remove space from linker flags passed with -Wl
* Remove ATen from Caffe2 dep libs since directly included
* Potential Windows fixes
* Preserve options while sleef builds
* Force BUILD_SHARED_LIBS flag for Caffe2 builds
* Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing
* Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake
* Fixes for the last two changes
* Potential fix for Mac build failure
* Switch Caffe2 to build_caffe2 dir to not conflict
* Cleanup FindMKL.cmake
* Another attempt at Mac cpp_build fix
* Clear cpp-build directory for Mac builds
* Disable test in Mac build/test to match cmake
CMake 3.2 is required to properly track dependencies in projects imported as ExternalProject_Add (BUILD_BYPRODUCTS parameter).
Users on Ubuntu 14.04 LTS would need to install and use cmake3 package for configurations. Users of other popular distributions generally have a recent enough CMake package.
* [cmake] Move nccl to modern cmake, and avoid using EXTERNAL_DEPENDENCIES
* [cmake] Move nnpack to modern cmake and avoid using EXTERNAL_DEPENDENCIES.
* [cmake] Move ATen to modern cmake and avoid using EXTERNAL_DEPENDENCIES.
* Move cpufeatures to modern cmake, and avoid using EXTERNAL_DEPENDENCIES
* Finally remove EXTERNAL_DEPENDENCIES.
* Maratyszcza's comments
Summary:
If this variable is set to a ccache symlink then the NCCL build will
also use the cache. The NCCL build is the slowest component of a cached
build without this change
Closes https://github.com/caffe2/caffe2/pull/1416
Reviewed By: Yangqing
Differential Revision: D6214008
Pulled By: pietern
fbshipit-source-id: e0a90e27de9b1c5a1fdc0e5bad5fb61f9fa924c3
Summary:
Use HINTS instead of PATHS for find_library so that you can specify
-DNCCL_ROOT_DIR and it will use this NCCL installation regardless of
what else is installed on your system. Also add a path hint to include
the default base path for NCCL 2 libraries.
Closes https://github.com/caffe2/caffe2/pull/1152
Reviewed By: Yangqing
Differential Revision: D5740053
Pulled By: pietern
fbshipit-source-id: 43f0908a63e8a9b90320dece0bbb558827433b48
Summary:
I closed https://github.com/caffe2/caffe2/pull/736 because one of these variables should be used after all.
Here's how C1 uses this variable: https://github.com/BVLC/caffe/blob/rc5/cmake/Targets.cmake#L116
Without this fix, there is a race condition in the parallel build leading to this error:
```
make[2]: *** No rule to make target `../third_party/NNPACK/lib/libnnpack.a', needed by `caffe2/libCaffe2_CPU.so'.
```
Closes https://github.com/caffe2/caffe2/pull/737
Differential Revision: D5211794
Pulled By: Yangqing
fbshipit-source-id: 9e368f09b01edaf86252727adc6f6cc40d244e29
Summary:
This was found necessary on some CentOS. aaronmarkham
Closes https://github.com/caffe2/caffe2/pull/240
Differential Revision: D4819591
Pulled By: Yangqing
fbshipit-source-id: 40161cd484a2c8d43f26077919ad2762440dde13
Summary:
This also requires a change to cmake/External/nccl.cmake to use the
static NCCL binary instead of the shared object. When the Caffe2/Gloo
build uses the bundled NCCL version it should be packaged up in the
resulting libraries and not cause another runtime dependency on a
library that has to be installed separately.
Closes https://github.com/caffe2/caffe2/pull/218
Differential Revision: D4769926
Pulled By: pietern
fbshipit-source-id: 5c85559992c200d874f4218724823815ffb5adb5
Summary:
Necessary if CXX isn't set when cmake is called. The CXX variable will then be
empty which prevents make from using its own default.
Closes https://github.com/caffe2/caffe2/pull/202
Differential Revision: D4711113
Pulled By: Yangqing
fbshipit-source-id: 895c07044b263ba9b5440453978248506d7ac225
Summary: This fixes build that include caffe2 and change the value of CMAKE_BINARY_DIR to their own binary dir. Allows the generation of protobuf headers/files in particular.
Reviewed By: Yangqing
Differential Revision: D4466126
fbshipit-source-id: eba264094dd2bff07a7f050b95fd2d5525462b09