mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
### Changes - Detect NVSHMEM install location via `sysconfig.get_path("purelib")`, which typically resolves to `<conda_env>/lib/python/site-packages`, and NVSHMEM include and lib live under `nvidia/nvshmem` - Added link dir via `target_link_directories` - Removed direct dependency on mlx5 - Added preload rule (following other other NVIDIA libs) ### Plan of Record 1. End user experience: link against NVSHMEM dynamically (NVSHMEM lib size is 100M, similar to NCCL, thus we'd like users to `pip install nvshmem` than torch carrying the bits) 2. Developer experience: at compile time, prefers wheel dependency than using Git submodule General rule: submodule for small lib that torch can statically link with If user pip install a lib, our CI build process should do the same, rather than building from Git submodule (just for its header, for example) 3. Keep `USE_NVSHMEM` to gate non-Linux platforms, like Windows, Mac 4. At configuration time, we should be able to detect whether nvshmem is available, if not, we don't build `NVSHMEMSymmetricMemory` at all. For now, we have symbol dependency on two particular libs from NVSHMEM: - libnvshmem_host.so: contains host side APIs; - libnvshmem_device.a: contains device-side global variables AND device function impls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153010 Approved by: https://github.com/ngimel, https://github.com/fduwjj, https://github.com/Skylion007