mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Howard Huang b3ef0c99f5 [PP] Fix zero bubble composability with DP (#134052 )

Moved all the backward functions (`stage_backward_input`, `stage_backward_weight`, `stage_backward`) under the same `backward_maybe_with_nosync` function which controls the logic of the data parallel wrappers.

FSDP was not working with zero bubble PP because there will be twice as many "backward" calls and we update the weight gradients after `autograd.grad` is called. As a result, we need to manually call the FSDP `post_backward_hook()` after the weights have the correct gradients.

Fixes the tests:
`python test/distributed/_composable/test_composability/test_pp_composability.py ComposabilityTest.test_manual_with_data_parallel_dp_type_FSDP_ScheduleClass0_use_new_runtime_False`

`python test/distributed/_composable/test_composability/test_pp_composability.py ComposabilityTest.test_manual_with_data_parallel_dp_type_DDP_ScheduleClass0_use_new_runtime_False`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134052
Approved by: https://github.com/kwen2501

2024-09-04 23:46:29 +00:00

__init__.py

[PP] Add ZeroBubble schedule (#133467 )

2024-08-22 13:32:15 +00:00

_backward.py

[PP] Fix zero bubble composability with DP (#134052 )

2024-09-04 23:46:29 +00:00

_debug.py

Flip default value for mypy disallow_untyped_defs [6/11] (#127843 )

2024-06-08 18:49:29 +00:00

_IR.py

[PP] Go back to export instead of _export (#134299 )