pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Wei Feng d270e2d240 [FSDP2] better error msg for cpu offloading (#135156 )

when cpu offloading is enabled, if user load a gpu state dict, FSDP2 will throw a less obvious error at backward
```
RuntimeError: attempting to assign a gradient with device type 'cpu' to a tensor with device type 'cuda'. Please ensure that the gradient and the tensor are on the same device
```

this PR throws error more explicitly by specifying which parameters should be moved because of cpu offloading

```
FSDP parameters should be materialized on cpu when enabling cpu offloading. For example, load cpu state dict or call module.to_empty(device="cpu"). Found following parameters on non-cpu device: ['0.weight']
```

`pytest -s test/distributed/_composable/fsdp/test_fully_shard_state_dict.py -k test_dp_state_dict_cpu_offload`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135156
Approved by: https://github.com/awgu

2024-09-12 00:05:07 +00:00

fsdp

[FSDP2] better error msg for cpu offloading (#135156 )

2024-09-12 00:05:07 +00:00

__init__.py

Add global registry to composable API contract (#90579 )

2022-12-10 22:41:10 +00:00

checkpoint_activation.py

Add None return type to init (#132335 )

2024-08-01 15:26:45 +00:00

contract.py

Add None return type to init (#132335 )