No functional changes, just:
- Update C++ standard to C++17
- Update `cmake` min version to 3.18
- Update `libuv` dependency to 1.51 (to move its cmake min version to 3.10)
- Replace boost optional implementation with `std::optional` wrapper
- Make it compilable with gcc-14.x plus by including `cstddef` in few headers
- Avoid using deprecated enums for MacOS builds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159834
Approved by: https://github.com/Skylion007
And prevent new ones from appearing by removing `-Wno-error=extra-semi` (not sure what was thereason behind adding the warning but not erroring on on it when building with -Werror introduced by https://github.com/pytorch/pytorch/pull/140236 )
300+ violations of that rule were fixed by running `sed -i -e "s/});/})/" /` against `torch/nativert`
Other 3p deps that needs updates:
- TensorPipe
- LLVM
- FBGEMM
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158730
Approved by: https://github.com/Skylion007
Fixes the string_view errors and reland the work. The previous changes in torch/csrc/utils/invalid_arguments.cpp were too aggressive and not tested thoroughly. They are discarded.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110518
Approved by: https://github.com/ezyang
Apply clang-tidy check modernize-use-emplace. This is slightly more efficient by using an inplace constructor and is the recommended style in parts of the codebase covered by clang-tidy. This just manually applies the check to rest of the codebase. Pinging @ezyang as this is related to my other PRs he reviewed like #89000
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91077
Approved by: https://github.com/ezyang
Fixes#87359, which identifies use after free for reverse device maps. This is only in the dynamic RPC feature and not effecting stable RPC code path.
Unfortunately the test `TensorPipeRpcTest.test_dynamic_rpc_existing_rank_can_communicate_with_new_rank_cuda` that is failing is also running into separate issue. I've temporarily disabled some of the test code to investigate the error in asychronously.
Testing plan:
- tested all the dynamic RPC tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87627
Approved by: https://github.com/rohan-varma
Summary:
This PR was created to resolve issue brought up in https://fb.workplace.com/groups/319878845696681/permalink/741428653541696/
Changes:
- Adds timeout argument to RpcAgent.join()
- Add optional timeout argument to ThriftRpcAgent barrier()
- During shutdown (ThriftRpcAgent join) calls the barrier, the agent will use the timeout passed to shutdown and pass that timeout into the join().
- Update API.py to also include fix bug (missing timeout for signal)
- Change default shutdown timeout to 0 (no timeout). Existing functionality in _all_gather will remain the same and wait indefinitely for signal if no timeout is set for the function. New functionality has user specify timeout for both the signal and rpc calls.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76194
Test Plan:
Modified barrier test
buck test torch/fb/distributed/thriftRpcBackend/test:ThriftRpcAgentTest -- BarrierTest
Reviewed By: mrshenli
Differential Revision: D35825382
fbshipit-source-id: e91e9ab5d9fca08787cb6b6b8125a4b03d1c7cde
(cherry picked from commit fcf899a387001574bf4e39a213ea741611d76097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73373
This PR allows for newly joined ranks in Dynamic RPC to communicate with ranks that have already joined the group. That is, rank N will be able to run RPC against all ranks <= N.
Previously:
Process 1 (init):
```python
init_rpc("worker0", rank=0)
```
Process2 (command against a rank that already joined, would fail):
```python
init_rpc("worker1", rank=1)
rpc.sync("worker0", torch.add, (torch.tensor(1), torch.tensor(1)))
```
Now:
Above scenario succeeds
Test:
`pytest test/distributed/rpc/test_tensorpipe_agent.py -vsk test_init_rpc_without_world_size`
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D35052544
Pulled By: H-Huang
fbshipit-source-id: dba48b216731c27730e7d46aefd9e7191c792170
(cherry picked from commit f3c42d8482c933fd746d4da8e64fa40cdf92a221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73372
This PR which allows for optional `world_size` argument in init_rpc. This makes changes in rendezvous to allow for `NoneType` for world_size and creates a new code path when initializing TensorPipe agent for init_rpc. The TensorPipe agent is protected by a critical section enforced using the store, so that only one node can create a TPAgent at a time.
This PR does not yet enable RPC commands between ranks.
Previously:
```python
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '29500'
init_rpc("worker0", world_size=1, rank=0)
```
Now (only rank is needed):
```python
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '29500'
init_rpc("worker0", rank=0)
```
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D34621651
Pulled By: H-Huang
fbshipit-source-id: 09dbb511d5a00c219a6ce0a35501ff2e388998b0
(cherry picked from commit 834aedc3256167399c323169ef2f0c9b3cf98dff)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68128
Reland of D31762735 (0cbfd466d2).
This diff was originally reverted due to failure in test_send_export_type_through_rpc_with_custom_pickler.
I updated rpc_pickler_test.py to prevent a race condition where processes were not registering their pickler before handling their rpc_sync calls.
Test Plan:
rpc_pickler_test file:
buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test //caffe2/torch/fb/training_toolkit/backend/metrics/collectors/fbdata_aggregator/tests:batch_collector_test -- --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx
rpc_pickler stress test:
buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test -- --exact 'caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test - test_send_export_type_through_rpc_with_custom_pickler (caffe2.torch.fb.training_toolkit.backend.metrics.tests.rpc_pickler_test.CythonTypeRpcSpawnTest)' --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx --jobs 18 --stress-runs 10 --record-results
Reviewed By: mrshenli
Differential Revision: D32316077
fbshipit-source-id: e58de2335fbaa3ab46d46fe222c659197633a5e4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67924
This diff reverts the changes made in D31762735 (0cbfd466d2)
Test Plan: Wait for CI
Reviewed By: derekmod-fb
Differential Revision: D32214744
fbshipit-source-id: e0a65b6a31a88216ae1243549fcbc901ef812374
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65946
Add new function in agent_utils to perform a synchronization of active call counts using store. This is intended to replace the barrier and all_reduce used by the process group in RPC shutdown.
`test_ddp_comparison` and `test_ddp_comparison_uneven_inputs` test fail with these changes. It seems like the RPC agents are not accessing the same store, so the total count of processes never reaches the world size to exit the barrier, still ened to investigate why it is like this only for these test cases. Setting clean_shutdown to false ignores this code path which allows the test to pass.
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D31762736
Pulled By: H-Huang
fbshipit-source-id: cb5d0efe196f72726c63393c4293e97ec4f18548
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62960
A bug was filed a few years ago for sending sparse tensor over rpc #30807.
This pr updates rpc/tensorpipe logic for CUDA sparse tensors. During the serialization process, the pickler.cpp implementation breaks down the sparse tensor into two tensors and metadata. torch/csrc/distributed/rpc/tensorpipe_agent.cpp needs to be updated because it does not have logic sparse tensors. It pushes a single device for a sparse tensor. This is wrong because after the sparse tensor has been serialized, there will be two tensors. The second tensor will not have a device. This will cause the second tensor to have the wrong target device. tensorpipe_utils.cpp needs to be updated because deserialization happens after the data is received on the target pipe. This takes the two tensors and metadata sent and rebuilds the sparse tensor. There will be two tpDescriptors but only one tensor after deserialization. The logic is updated to verify the sparse tensor is on the correct device using the first tpDescriptor.
This pr also updates ivalue.cpp and ivalue.h to support more paths for Sparse COO tensors.
I tested these changes by adding sparse tests to rpc_test.py and dist_autograd_test.py.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D30717285
Pulled By: gcramer23
fbshipit-source-id: daee9a56764550f56b131f9dd8e74e23113d6714
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63028
TensorPipe until now required PyTorch to come up and provide a unique identifier to use as address for the UNIX domain socket used in the SHM transport. However the Linux kernel can automatically assign an available address (like it does with IP ports), and TensorPipe now supports it, so we can remove that useless PyTorch logic.
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D30220352
fbshipit-source-id: 78e8a6ef5916b2a72df26cdc9cd367b9d083e821