mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-21 05:34:18 +08:00
[inductor] Reduce device context manager overhead (#91045)
This adds `torch.cuda._DeviceGuard` which is a stripped down version of `torch.cuda.device` with lower overhead. To do this, it only accepts `int` as the device so we don't need to call `_get_device_index` and is implemented with a new C++ helper `torch._C._cuda_exchangeDevice` that allows `_DeviceGuard.__enter__` to be just a single function call. On my machine, I see a drop from 3.8us of overhead to 0.94 us with this simple benchmark: ```python def set_device(): with torch.cuda.device(0): pass %timeit set_device() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91045 Approved by: https://github.com/ngimel, https://github.com/anijain2305
This commit is contained in:
committed by
PyTorch MergeBot
parent
db466ae057
commit
eece6da162
@ -229,6 +229,7 @@ std::shared_ptr<SugaredValue> CUDAPythonModuleValue::attr(
|
||||
"current_stream",
|
||||
"default_stream",
|
||||
"current_device",
|
||||
"_exchange_device",
|
||||
"set_device",
|
||||
"device_index",
|
||||
"device_count",
|
||||
|
Reference in New Issue
Block a user