10 Commits

Author SHA1 Message Date
c6392fcc06 [2/N] Port 3 fsdp distributed test cases to Intel GPU (#160940)
For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. This is the second PR for fsdp distributed test cases, the first is https://github.com/pytorch/pytorch/pull/160158.
We could enable Intel GPU with following methods and try the best to keep the original code styles:
- Use "torch.accelerator.current_accelerator()" to determine the accelerator backend
- Enabled XPU for some test path

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160940
Approved by: https://github.com/guangyey, https://github.com/d4l3k
2025-09-17 10:45:28 +00:00
db3290846e [BE][Easy][10/19] enforce style for empty lines in import segments in test/d*/ (#129761)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129761
Approved by: https://github.com/fegin
2024-07-17 16:57:39 +00:00
a66f2a1b99 [state_dict] Move _gather_state_dict to dcp module (#112835)
This api is getting used by more than just FSDP. This PR moves it to DCP module.

Differential Revision: [D50962966](https://our.internmc.facebook.com/intern/diff/D50962966/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112835
Approved by: https://github.com/wz337
2023-11-08 19:42:56 +00:00
2fa063e1e0 [device_mesh][BE] remove allgather from DM (#105614)
For the reason similar to https://github.com/pytorch/pytorch/pull/105605
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105614
Approved by: https://github.com/rohan-varma, https://github.com/wz337, https://github.com/fduwjj
2023-07-27 01:33:05 +00:00
d991ce6da3 [FSDP][3/N]_shard_utils update for dtensor state_dict support (#103479)
Same as https://github.com/pytorch/pytorch/pull/102545 (this branch is corrupted so have to re-submit).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103479
Approved by: https://github.com/fegin
2023-06-14 06:45:28 +00:00
4f62e7cb10 [FSDP][BE] Remove unused code (#99731)
Remove the unused code. https://github.com/pytorch/pytorch/pull/99675 is duplicated and we should land this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99731
Approved by: https://github.com/wz337
2023-04-21 23:11:37 +00:00
60a68477a6 Bump black version to 23.1.0 (#96578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
244690205f [FSDP] Use _init_from_local_tensor to create ShardedTensor to avoid communication overhead (#82911)
FSDP originally uses `_init_from_local_shards_and_global_metadata()` to create a ShardedTensor for sharded_state_dict(). We have seen some non-trivial overhead if the number of tensors is large. Using `_init_from_local_shards_and_global_metadata ` can significantly reduce the overhead. For a model with ~250 tensors in the state_dict trained with 16 GPUs, the original `sharded_state_dict` takes ~1.7 seconds and this PR reduces the overhead to ~0.6 seconds.

Differential Revision: [D38452170](https://our.internmc.facebook.com/intern/diff/D38452170/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82911
Approved by: https://github.com/awgu
2022-08-17 16:40:20 +00:00
58c9d521a1 [FSDP] Implement sharded_state_dict and load_sharded_state_dict
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77356

Implement ShardedTensor compatible sharded_state_dict() and load_sharded_state_dict().

Algorithm overview:
sharded_state_dict():
  1. Call summon_full_parameters().
  2. For each unflattened, non-sharded parameter.
      2.1 Call chunk() to get the local shard of the parameter.
      2.2 Create a ShardedTensor.
  3. Replace the tensor in the state_dict with the newly created ShardedTensor.

load_sharded_state_dict():
   1. For each unflattened, sharded parameter (ShardedTensor) in the given state_dict:
       1.1 Pop out from the state_dict.
       1.2 Do allgather to reconstruct the unflattened, non-sharded parameter.
   2. Create a FlatParameter with the unflattened, non-sharded parameters.
   3. Shard the newly created FlatParameter.
   4. Insert the new FlatParameter into the state_dict.

Differential Revision: [D36284983](https://our.internmc.facebook.com/intern/diff/D36284983/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36284983/)!

Approved by: https://github.com/zhaojuanmao
2022-05-15 22:48:56 +00:00
577c9ff854 [FSDP] Implement reshard_flatten_tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75192

Implement reshard_flatten_tensor() to allow FSDP to reshard the flatten tensor
from equally sharding (chunk) to any other one-dimensional sharding.

Differential Revision: [D35361572](https://our.internmc.facebook.com/intern/diff/D35361572/)

Approved by: https://github.com/rohan-varma
2022-04-12 16:54:32 +00:00