Show friendly error message when forgetting `init` in `torch.cuda` (#72404)

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Summary:
# Problem
The error message `RuntimeError: Invalid device argument` is not friendly when users just forget calling `torch.cuda.init()`.
This error message is shown for example by calling  `torch.cuda.reset_accumulated_memory_stats`, or other methods which internally calls [assertValidDevice](6297aa114f/c10/cuda/CUDACachingAllocator.cpp (L1561-L1566)).

# Reproduce
```python
$ python
Python 3.8.6 (default, Apr  1 2021, 08:23:31)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch.cuda
>>> torch.cuda.reset_accumulated_memory_stats(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/torch/cuda/memory.py", line 219, in reset_accumulated_memory_stats
    return torch._C._cuda_resetAccumulatedMemoryStats(device)
RuntimeError: Invalid device argument.
>>> torch.cuda.current_device()
0
```

# This PR
Shows better error message like `RuntimeError: Invalid device argument 0: did you call init?`. I cited the error message from 6297aa114f/c10/cuda/CUDACachingAllocator.cpp (L1392-L1396).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72404

Reviewed By: mruberry

Differential Revision: D34063268

Pulled By: ngimel

fbshipit-source-id: 0775d9c83a4a0eb0eb41bf6efecca94a00692141
(cherry picked from commit 07a1a3d0b41e1898d0c293ca77ca712de91df51e)

This commit is contained in:

Joe

2022-02-08 12:12:10 -08:00

committed by

PyTorch MergeBot

parent e578a80184

commit c4af6ba173

1 changed files with 3 additions and 1 deletions

									
										4

c10/cuda/CUDACachingAllocator.cpp
									
												View File
												
				@ -1562,7 +1562,9 @@ static inline void assertValidDevice(int device) {

				  const auto device_num = caching_allocator.device_allocator.size();

				  TORCH_CHECK(

				      0 <= device && device < static_cast<int64_t>(device_num),

				      "Invalid device argument.");

				      "Invalid device argument ",

				      device,

				      ": did you call init?");

				}

				DeviceStats getDeviceStats(int device) {

Show friendly error message when forgetting init in torch.cuda (#72404)

4 c10/cuda/CUDACachingAllocator.cpp Unescape Escape View File

Show friendly error message when forgetting `init` in `torch.cuda` (#72404)

4

c10/cuda/CUDACachingAllocator.cpp

View File