Differential Revision: [D49858057](https://our.internmc.facebook.com/intern/diff/D49858057/)
**TL;DR**
This PR implements 2 different DDP all_reduce fusions in Inductor post_grad fx passes. The two fusions are 1) fusion with concat op and 2) fusion with all_reduce_coalesced. When DDP detects that Python reducer is being used, DDP will automatically turn on the fusion.
This PR does not invent any algorithm and simply reflects the bucket size users set to DDP.
**Implementation Details**
*Fusion with concat op*
The idea of this fusion is to use a concat op to concatenate all the gradients into one tensor and perform one `all_reduce`. After the `wait` op of the `all_reduce`, splitting and reshaping will also be perform to get the individual gradient.
Because DDP needs to perform gradient scaling, the benefit of using this fusion is that we could perform the gradient scaling over the the concatenated buffer.
*Fusion with `all_reduce_coalesced`*
The idea of this fusion is to use `all_reduce_coalesced` op to directly perform the `all_reduce` over multiple buffers. This avoid the copy overhead but may not achieve the best NCCL performance. In addition, because there are multiple buffers, we could not do one simple gradient scaling but have to rely on `foreach_div` to help the gradient scaling.
**Limitations**
Current fusions do not distinguish `all_reduce` generated by different DDP modules. This is okay if all DDP instances use the same PG and data type. The support of multiple DDP instances with different PG and data type will come in the later PRs.
**TODOs**
- [x] Implement DDP allreduce fusion algorithm for Inductor post_grad pass.
- [ ] Add unit tests to ensure the fusion doesn't DDP + TP.
- [ ] Group different PG and data type of `all_reduce`s.
- [ ] Mixed precision supports and tests
- [ ] Implement the fusions with Inductor IR.
- [ ] Add auto bucketing based on Inductor profiling.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113209
Approved by: https://github.com/yf225
**Summary**
The reducer of `DistributedDataParallel` is implemented with C++ and it is not easy to trace the allreduce launched in the reducer. This PR modifies `DistributedDataParallel` to launch one allreduce per gradient when `compiled_autograd` is enabled. The changes allow us to use `compiled_autograd` to trace the allreduce and later be optimized (fused) in the Inductor.
**Key Logic**
1. If `ddp_python_hook` is True, we assume `compiled_autograd` is used. `DistributedDataParallel` registers `compiled_accum_grad_hook` for all parameters.
2. In the first forward() call, if `DistributedDataParallel` is not compiled, all `compiled_accum_grad_hook` are deregistered. If `DistributedDataParallel` is compiled, all `compiled_accum_grad_hook` will be compiled by `compiled_autograd`.
3. `compiled_accum_grad_hook` launches an allreduce to reduce the gradient of the parameter.
**Bucketing**
The compiled backward is slow because there is no bucketing for the allreduces. We rely on Inductor to bucket the allreduces.
The bucketing is done in a separate PR.
Differential Revision: [D49428482](https://our.internmc.facebook.com/intern/diff/D49428482/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110662
Approved by: https://github.com/wconstab
Work.result() returns a vector of tensors. This signature is problematic as some collectives may just return one tensor (e.g all-reduce), while some others may return multiple tensors (e.g. all-gather).
It would be clearer/easier for users to directly access the result via the tensor/tensorlist passed to the collective APIs.
Deprecating work.result() would also allow us to remove the `outputs_` field in the Work class, avoiding an "artificial" reference to the tensor, which could potentially hold up the tensor's memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117565
Approved by: https://github.com/wconstab
Previously:
```
[W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
[W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
```
With this PR, those warnings disappear. They were introduced in #114077
This change was generated with this sed script, applied with `sed -i -f /tmp/x **/*.{py,hpp,cpp,cc}` and hand inspected.
```
s/\bNCCL_BLOCKING_WAIT\b/TORCH_NCCL_BLOCKING_WAIT/g
s/\bNCCL_ENABLE_TIMING\b/TORCH_NCCL_ENABLE_TIMING/g
s/\bNCCL_DESYNC_DEBUG\b/TORCH_NCCL_DESYNC_DEBUG/g
s/\bNCCL_ASYNC_ERROR_HANDLING\b/TORCH_NCCL_ASYNC_ERROR_HANDLING/g
s/\bENABLE_NCCL_HEALTH_CHECK\b/TORCH_ENABLE_NCCL_HEALTH_CHECK/g
s/\bNCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK\b/TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK/g
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114880
Approved by: https://github.com/kwen2501
https://github.com/pytorch/pytorch/pull/113580 introduced the `DDP._update_process_group` API. However, the implementation did not correctly reset all of the necessary state in the reducer. In particular if an error occurred during backward, DDP would end up in an incorrect state.
As a result, in this PR I've enhanced the unit test to test for this case and also appropriately fixed resetting Reducer state.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114194
Approved by: https://github.com/rohan-varma
# Motivation
If we would like to reinitialize DDP with a different PG with `torch.compile`, we need to do the following:
```
del old_ddp
del old_pg
pg = init_pg(...)
ddp = DDP(pg)
model = torch.compile(DDP)
```
This results in recompilation of the entire model and is very expensive. Since the only thing we need to update is the PG, we should be able to do this without having to compile the model again.
# Proposal
As a result, in this PR I've introduced an `_update_process_group` API which can dynamically update the underlying ProcessGroup used by DDP without needing to reinitialize DDP again.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113580
Approved by: https://github.com/fduwjj
Fixes#112604
Fixes docstring by following `pydocstyle` outputs.
- torch/nn/parallel/distributed.py
Before: 84
```
torch/nn/parallel/distributed.py:1 at module level:
D100: Missing docstring in public module
torch/nn/parallel/distributed.py:92 in private function `_cast_buffers`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/parallel/distributed.py:103 in private function `_setup_mixed_precision_params`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/parallel/distributed.py:103 in private function `_setup_mixed_precision_params`:
D401: First line should be in imperative mood (perhaps 'Create', not 'Creates')
torch/nn/parallel/distributed.py:143 in private function `_find_tensors`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/parallel/distributed.py:273 in private method `__init__`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/parallel/distributed.py:273 in private method `__init__`:
D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
torch/nn/parallel/distributed.py:287 in private method `main_hook`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:287 in private method `main_hook`:
D400: First line should end with a period (not 'd')
torch/nn/parallel/distributed.py:324 in private method `post_hook`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:324 in private method `post_hook`:
D400: First line should end with a period (not 'l')
torch/nn/parallel/distributed.py:324 in private method `post_hook`:
D401: First line should be in imperative mood (perhaps 'Sync', not 'Syncs')
torch/nn/parallel/distributed.py:332 in public class `DistributedDataParallel`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:332 in public class `DistributedDataParallel`:
D400: First line should end with a period (not 'n')
torch/nn/parallel/distributed.py:633 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/parallel/distributed.py:960 in private method `_fire_reducer_autograd_hook`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:960 in private method `_fire_reducer_autograd_hook`:
D401: First line should be in imperative mood (perhaps 'Fire', not 'Fires')
torch/nn/parallel/distributed.py:969 in private method `_root_copy_hook`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:969 in private method `_root_copy_hook`:
D400: First line should end with a period (not 's')
torch/nn/parallel/distributed.py:1012 in private method `_module_wait_for_copy_hook`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:1012 in private method `_module_wait_for_copy_hook`:
D400: First line should end with a period (not 'e')
torch/nn/parallel/distributed.py:1050 in private method `_ddp_init_helper`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:1050 in private method `_ddp_init_helper`:
D400: First line should end with a period (not ':')
torch/nn/parallel/distributed.py:1050 in private method `_ddp_init_helper`:
D401: First line should be in imperative mood (perhaps 'Initialize', not 'Initialization')
torch/nn/parallel/distributed.py:1146 in public method `__getstate__`:
D105: Missing docstring in magic method
torch/nn/parallel/distributed.py:1154 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/nn/parallel/distributed.py:1222 in private method `_assign_modules_buffers`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:1222 in private method `_assign_modules_buffers`:
D400: First line should end with a period (not 'o')
torch/nn/parallel/distributed.py:1222 in private method `_assign_modules_buffers`:
D401: First line should be in imperative mood (perhaps 'Assign', not 'Assigns')
torch/nn/parallel/distributed.py:1277 in private method `_get_parameters`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/parallel/distributed.py:1277 in private method `_get_parameters`:
D400: First line should end with a period (not 's')
torch/nn/parallel/distributed.py:1277 in private method `_get_parameters`:
D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
torch/nn/parallel/distributed.py:1312 in public method `no_sync`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:1312 in public method `no_sync`:
D400: First line should end with a period (not 'P')
torch/nn/parallel/distributed.py:1312 in public method `no_sync`:
D401: First line should be in imperative mood; try rephrasing (found 'A')
torch/nn/parallel/distributed.py:1340 in private method `_get_active_ddp_module`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/parallel/distributed.py:1340 in private method `_get_active_ddp_module`:
D403: First word of the first line should be properly capitalized ('Torchdynamo', not 'TorchDynamo')
torch/nn/parallel/distributed.py:1517 in public method `forward`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1527 in public method `scatter`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1530 in public method `to_kwargs`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1539 in public method `gather`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1542 in public method `train`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1617 in public method `join`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:1617 in public method `join`:
D400: First line should end with a period (not 'f')
torch/nn/parallel/distributed.py:1617 in public method `join`:
D401: First line should be in imperative mood; try rephrasing (found 'A')
torch/nn/parallel/distributed.py:1723 in public method `join_hook`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:1723 in public method `join_hook`:
D400: First line should end with a period (not 'y')
torch/nn/parallel/distributed.py:1723 in public method `join_hook`:
D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
torch/nn/parallel/distributed.py:1752 in public method `join_device`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1756 in public method `join_process_group`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1765 in private method `_register_buffer_comm_hook`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:1765 in private method `_register_buffer_comm_hook`:
D400: First line should end with a period (not 'e')
torch/nn/parallel/distributed.py:1765 in private method `_register_buffer_comm_hook`:
D401: First line should be in imperative mood (perhaps 'Allow', not 'Allows')
torch/nn/parallel/distributed.py:1805 in public method `register_comm_hook`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:1805 in public method `register_comm_hook`:
D400: First line should end with a period (not 'a')
torch/nn/parallel/distributed.py:1805 in public method `register_comm_hook`:
D401: First line should be in imperative mood (perhaps 'Register', not 'Registers')
torch/nn/parallel/distributed.py:1887 in private method `_register_builtin_comm_hook`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:1887 in private method `_register_builtin_comm_hook`:
D400: First line should end with a period (not 'P')
torch/nn/parallel/distributed.py:1887 in private method `_register_builtin_comm_hook`:
D401: First line should be in imperative mood (perhaps 'Register', not 'Registers')
torch/nn/parallel/distributed.py:1914 in private method `_register_fused_optim`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:1914 in private method `_register_fused_optim`:
D400: First line should end with a period (not 'a')
torch/nn/parallel/distributed.py:1914 in private method `_register_fused_optim`:
D401: First line should be in imperative mood (perhaps 'Register', not 'Registers')
torch/nn/parallel/distributed.py:2005 in public method `will_sync_module_buffers`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:2060 in private method `_default_broadcast_coalesced`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:2060 in private method `_default_broadcast_coalesced`:
D400: First line should end with a period (not 'e')
torch/nn/parallel/distributed.py:2128 in private method `_get_data_parallel_params`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/parallel/distributed.py:2128 in private method `_get_data_parallel_params`:
D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
torch/nn/parallel/distributed.py:2141 in private method `_set_params_and_buffers_to_ignore_for_model`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:2141 in private method `_set_params_and_buffers_to_ignore_for_model`:
D400: First line should end with a period (not 'r')
torch/nn/parallel/distributed.py:2141 in private method `_set_params_and_buffers_to_ignore_for_model`:
D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
torch/nn/parallel/distributed.py:2170 in private method `_get_ddp_logging_data`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:2170 in private method `_get_ddp_logging_data`:
D400: First line should end with a period (not 's')
torch/nn/parallel/distributed.py:2170 in private method `_get_ddp_logging_data`:
D401: First line should be in imperative mood; try rephrasing (found 'This')
torch/nn/parallel/distributed.py:2184 in private method `_set_ddp_runtime_logging_sample_rate`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:2184 in private method `_set_ddp_runtime_logging_sample_rate`:
D400: First line should end with a period (not 'g')
torch/nn/parallel/distributed.py:2184 in private method `_set_ddp_runtime_logging_sample_rate`:
D401: First line should be in imperative mood; try rephrasing (found 'This')
torch/nn/parallel/distributed.py:2202 in private method `_set_static_graph`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:2202 in private method `_set_static_graph`:
D400: First line should end with a period (not 'l')
torch/nn/parallel/distributed.py:2202 in private method `_set_static_graph`:
D401: First line should be in imperative mood; try rephrasing (found 'It')
torch/nn/parallel/distributed.py:2227 in private method `_remove_autograd_hooks`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/parallel/distributed.py:2227 in private method `_remove_autograd_hooks`:
D401: First line should be in imperative mood (perhaps 'Remove', not 'Removes')
torch/nn/parallel/distributed.py:2233 in private method `_check_reducer_finalized`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/parallel/distributed.py:2233 in private method `_check_reducer_finalized`:
D400: First line should end with a period (not 'd')
torch/nn/parallel/distributed.py:2233 in private method `_check_reducer_finalized`:
D401: First line should be in imperative mood (perhaps 'Check', not 'Checks')
84
```
After: 12
```
torch/nn/parallel/distributed.py:1 at module level:
D100: Missing docstring in public module
torch/nn/parallel/distributed.py:618 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/parallel/distributed.py:1133 in public method `__getstate__`:
D105: Missing docstring in magic method
torch/nn/parallel/distributed.py:1141 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/nn/parallel/distributed.py:1503 in public method `forward`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1513 in public method `scatter`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1516 in public method `to_kwargs`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1525 in public method `gather`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1528 in public method `train`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1734 in public method `join_device`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1738 in public method `join_process_group`:
D102: Missing docstring in public method
torch/nn/parallel/distributed.py:1986 in public method `will_sync_module_buffers`:
D102: Missing docstring in public method
12
```
- torch/nn/utils/_named_member_accessor.py
Before: 23
```
torch/nn/utils/_named_member_accessor.py:12 in public function `set_tensor`:
D103: Missing docstring in public function
torch/nn/utils/_named_member_accessor.py:29 in public function `swap_tensor`:
D103: Missing docstring in public function
torch/nn/utils/_named_member_accessor.py:85 in public function `swap_submodule`:
D103: Missing docstring in public function
torch/nn/utils/_named_member_accessor.py:109 in public class `NamedMemberAccessor`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:109 in public class `NamedMemberAccessor`:
D400: First line should end with a period (not 's')
torch/nn/utils/_named_member_accessor.py:115 in public method `__init__`:
D107: Missing docstring in __init__
torch/nn/utils/_named_member_accessor.py:122 in public method `get_submodule`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:155 in public method `swap_submodule`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:164 in public method `get_tensor`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:185 in public method `set_tensor`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:194 in public method `del_tensor`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:211 in public method `swap_tensor`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:224 in public method `get_tensors`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:233 in public method `set_tensors`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:249 in public method `set_tensors_dict`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:261 in public method `del_tensors`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:276 in public method `swap_tensors`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:296 in public method `swap_tensors_dict`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_named_member_accessor.py:325 in public method `check_keys`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/utils/_named_member_accessor.py:340 in public method `named_parameters`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/utils/_named_member_accessor.py:349 in public method `named_buffers`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/utils/_named_member_accessor.py:358 in public method `named_tensors`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/nn/utils/_named_member_accessor.py:368 in public method `named_modules`:
D200: One-line docstring should fit on one line with quotes (found 3)
23
```
After: 4
```
torch/nn/utils/_named_member_accessor.py:12 in public function `set_tensor`:
D103: Missing docstring in public function
torch/nn/utils/_named_member_accessor.py:29 in public function `swap_tensor`:
D103: Missing docstring in public function
torch/nn/utils/_named_member_accessor.py:85 in public function `swap_submodule`:
D103: Missing docstring in public function
torch/nn/utils/_named_member_accessor.py:116 in public method `__init__`:
D107: Missing docstring in __init__
4
```
- torch/nn/utils/_per_sample_grad.py
Before: 3
```
torch/nn/utils/_per_sample_grad.py:12 in public function `call_for_per_sample_grads`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/_per_sample_grad.py:12 in public function `call_for_per_sample_grads`:
D400: First line should end with a period (not ')')
torch/nn/utils/_per_sample_grad.py:12 in public function `call_for_per_sample_grads`:
D402: First line should not be the function's "signature"
3
```
After: 0
```
0
```
- torch/nn/utils/init.py
Before: 3
```
torch/nn/utils/init.py:1 at module level:
D100: Missing docstring in public module
torch/nn/utils/init.py:6 in public function `skip_init`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/init.py:6 in public function `skip_init`:
D400: First line should end with a period (not 'g')
3
```
After: 1
```
torch/nn/utils/init.py:1 at module level:
D100: Missing docstring in public module
1
```
- torch/nn/utils/memory_format.py
Before: 4
```
torch/nn/utils/memory_format.py:1 at module level:
D100: Missing docstring in public module
torch/nn/utils/memory_format.py:5 in public function `convert_conv2d_weight_memory_format`:
D202: No blank lines allowed after function docstring (found 1)
torch/nn/utils/memory_format.py:5 in public function `convert_conv2d_weight_memory_format`:
D205: 1 blank line required between summary line and description (found 0)
torch/nn/utils/memory_format.py:5 in public function `convert_conv2d_weight_memory_format`:
D400: First line should end with a period (not '`')
4
```
After: 1
```
torch/nn/utils/memory_format.py:1 at module level:
D100: Missing docstring in public module
1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112657
Approved by: https://github.com/fduwjj
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
Adds support for multiple forward before bwd call for
static_graph=True.
There are 2 changes:
1) Change tracking of accounting of when to populate static grap related maps
from relying on forward iteration to backward calls
2) In DDP python, don't rely on num_forward iterations == 1 to enqueue the
delay allreduce. Instead use a flag.
Differential Revision: [D46673736](https://our.internmc.facebook.com/intern/diff/D46673736/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103487
Approved by: https://github.com/awgu
Summary: In cases where DDP backward is not finalized, the error is raised only in the next forward iteration of DDP. However, if there are other collective calls between those two points, training scripts could potentially get stuck.
As a result, there should be a way to check if DDP finalized after calling `.backward()`. To address this, I've added a `_check_reducer_finalized` method to validate that DDP indeed did successfully finish reduction.
Test Plan: Added unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100773
Approved by: https://github.com/rohan-varma
Summary: Disable buffers sync in _sync_module_states(...) when broadcast_buffers is False. This change will memory usage when a model has huge buffers and does not need broadcast buffers.
Test Plan: .
Differential Revision: D45610709
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100729
Approved by: https://github.com/mrshenli
Summary:
When creating a new DDP instance for the same model when an old DDP instance existed, the autograd hooks from the old DDP instance might not be cleared. Also, relying on python gc to clear out old autograd hooks is fragile and may not work 100% of the time.
As a result, in this PR I'm adding a way to explicitly remove these hooks from DDP
Test Plan:
Unit test added
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96490
Approved by: https://github.com/zhaojuanmao, https://github.com/rohan-varma