Show friendly error message when forgetting init in torch.cuda (#72404)

Summary:
# Problem
The error message `RuntimeError: Invalid device argument` is not friendly when users just forget calling `torch.cuda.init()`.
This error message is shown for example by calling  `torch.cuda.reset_accumulated_memory_stats`, or other methods which internally calls [assertValidDevice](6297aa114f/c10/cuda/CUDACachingAllocator.cpp (L1561-L1566)).

# Reproduce
```python
$ python
Python 3.8.6 (default, Apr  1 2021, 08:23:31)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch.cuda
>>> torch.cuda.reset_accumulated_memory_stats(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/torch/cuda/memory.py", line 219, in reset_accumulated_memory_stats
    return torch._C._cuda_resetAccumulatedMemoryStats(device)
RuntimeError: Invalid device argument.
>>> torch.cuda.current_device()
0
```

# This PR
Shows better error message like `RuntimeError: Invalid device argument 0: did you call init?`. I cited the error message from 6297aa114f/c10/cuda/CUDACachingAllocator.cpp (L1392-L1396).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72404

Reviewed By: mruberry

Differential Revision: D34063268

Pulled By: ngimel

fbshipit-source-id: 0775d9c83a4a0eb0eb41bf6efecca94a00692141
(cherry picked from commit 07a1a3d0b41e1898d0c293ca77ca712de91df51e)
This commit is contained in:
Joe
2022-02-08 12:12:10 -08:00
committed by PyTorch MergeBot
parent e578a80184
commit c4af6ba173

View File

@ -1562,7 +1562,9 @@ static inline void assertValidDevice(int device) {
const auto device_num = caching_allocator.device_allocator.size();
TORCH_CHECK(
0 <= device && device < static_cast<int64_t>(device_num),
"Invalid device argument.");
"Invalid device argument ",
device,
": did you call init?");
}
DeviceStats getDeviceStats(int device) {