Files
pytorch/torch/distributed/fsdp
Wei Feng cfb8aec1a4 [FSDP2] idempotent reset_sharded_param: no-op if _local_tensor is already padded (#163130)
resolves https://github.com/pytorch/torchtitan/issues/1136

torchtitan use cached state dict for ft. reset_sharded_param should be idempotent if model.parameters() are padded already

```
# pad DTensor._local_tensor
fully_shard(model)
sd = fsdp_model.state_dict()
# reset_sharded_param should be a no-op in lazy_init
loss = fsdp_model(inp).sum()
```

this PR make `reset_sharded_param` idempotent by checking storage data ptr and return early

unit test
```
pytest -s test/distributed/_composable/fsdp/test_fully_shard_state_dict.py -k test_cached_state_dict
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163130
Approved by: https://github.com/tianyu-l
2025-09-18 09:20:37 +00:00
..
2025-09-16 17:42:19 +00:00
2025-03-04 03:09:55 +00:00