# Motivation
This PR moves the implementation of `torch.cuda.memory._set_allocator_settings` to `torch._C._accelerator_setAllocatorSettings`.
Since the original API was intended as a temporary/internal utility, I am not exposing the new function as a public API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156175
Approved by: https://github.com/albanD
ghstack dependencies: #159629, #150312, #156165
# Motivation
This PR moves the implementation of `torch.cuda.memory._set_allocator_settings` to `torch._C._accelerator_setAllocatorSettings`.
Since the original API was intended as a temporary/internal utility, I am not exposing the new function as a public API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156175
Approved by: https://github.com/albanD
ghstack dependencies: #149601, #157908, #150312, #156165
# Motivation
We propose adding support for the Python with statement on `torch.accelerator.device_index` to enable device switching functionality. This enhancement would simplify writing device-agnostic code and provide benefits across all accelerators. Its device-specific counterparts include [`torch.cuda.device`](00199acdb8/torch/cuda/__init__.py (L482)) and [`torch.cuda._DeviceGuard`](00199acdb8/torch/cuda/__init__.py (L469)).
**Design Philosophy**
It accepts either an `Int` or `None` as input. When `None` is passed, no device switch is performed. Supporting `None` is important for compatibility, as it's possible to encounter `None` values from `torch.device.index`.
Therefore, with this PR, we can do like this
```python
src = 0
dst = 1
# Set src to current device
torch.accelerator.set_device_index(src)
with torch.accelerator.device_index(dst):
# Inside with statement, we set dst to current device
assert torch.accelerator.get_device_index() == dst
# Here the current device should be src
assert torch.accelerator.get_device_index() == src
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148864
Approved by: https://github.com/albanD
# Motivation
Adapt `torch.accelerator.device_count` for multi-process usage. For example, `torch.cuda.device_count` avoids poisoning fork, then `torch.accelerator.device_count` should meet the same requirement.
Now that `torch.get_device_module(device).device_count` supports this, `torch.accelerator.device_count` should align with this behavior as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149924
Approved by: https://github.com/albanD
ghstack dependencies: #147507
# Motivation
Support `torch.accelerator.synchronize()` on mps. The root cause is that MPS doesn't support lazy initialization. So we must check if the current accelerator supports device lazy initialization rather than early return.
# Additional Context
Add a mps UT to test code change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143171
Approved by: https://github.com/albanD
# Motivation
According to [[RFC]A device-agnostic Python runtime API design for stream-based accelerators](https://github.com/pytorch/pytorch/issues/128403), this PR intends to introduce a device-agnostic runtime API design.
I personally prefer the **Simple Version** APIs that no longer accept the device type as an input argument. It means we will leverage `getAccelerator` to fetch the current accelerator. And it is flexible to expand these APIs to handle multiple types of accelerator scenarios. The design does **NOT** break the previous design philosophies.
I also believe that namespace torch.accelerator is better. It lets users know that the APIs they are calling are running on an accelerator rather than CPU. This is important. Meanwhile, we can follow a simple API design principle:
1. Device-agnostic APIs should be placed under the torch.accelerator namespace and not accept a device_type optional parameter.
2. Device-specific APIs should be placed under device-specific submodules.
3. APIS required by both CPU and accelerators should be placed under the torch namespace and accept a device_type optional parameter.
Also, I list the pros and cons of **Simple Version** here:
Pros:
- `torch.accelerator.foo` will have the same input argument as `torch.xxx.foo`, bringing a better user experience;
- more concise, facilitate the developer to write a device-agnostic code.
Cons:
- no obvious drawbacks.
# Additional Context
I list the new APIs here:
```python
torch.accelerator.is_available() -> bool:
torch.accelerator.current_accelerator() -> torch.device:
torch.accelerator.device_count() -> int:
torch.accelerator.current_device_idx() -> int:
torch.accelerator.set_device_idx(device: Union[torch.device, str, int, None]) -> None:
torch.accelerator.current_stream(device: Union[torch.device, str, int, None]) -> torch.Stream:
torch.accelerator.set_stream(stream: torch.Stream) -> None:
torch.accelerator.synchronize(device: Union[torch.device, str, int, None]) -> None:
```
According to the discussion with Alban, we decide to change the API name `set_device` to `set_device_idx` and `current_device` to `current_device_idx` for more explicit. And will submit other PR to support device and stream context manager.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132204
Approved by: https://github.com/EikanWang, https://github.com/abhilash1910, https://github.com/gujinghui, https://github.com/albanD