Re-raising of #129959 as that was closed.
Warning message before:
```
/home/admin/.local/share/hatch/env/virtual/toms-project-1/Qv9k_r_5/dev/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py:120: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
```
Warning message after:
```
/path/to/my/code:91: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
```
Helps the user find where the issue stems from in their code. What do you think?
(Looks like "skip_file_prefixes" is not available until Python 3.12 minimum...)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155112
Approved by: https://github.com/Skylion007, https://github.com/cyyever
Fixes#142397
Basic implementation is done. What's left:
- [x] Different dtype/device tensors in the TensorList
- [x] fast path for grouping the foreach kernel
- [x] Tests
Regarding tests, I found some tests in `test/test_torch.py` for GradScaler but I couldn't figure out what is the best way to enable the test for MPS device.
By removing `@onlyNativeDeviceTypes`, one enables the tests for MPS but also enables tests for all other devices which are not included in the native device types. If I put:
`instantiate_device_type_tests(TestTorchDeviceType, globals(), allow_mps=True)`
This enables lots of tests in that class for MPS which were not(?) being tested before? This part needs some clarification
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150255
Approved by: https://github.com/malfet
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>