Commit Graph

31 Commits

Author SHA1 Message Date
6de110b862 Support with statement on torch.Stream (#140138)
# Motivation
We propose to support Python with statement on `torch.Stream`. This is a benefit for all accelerators when writing device-agnostic code. The device-specific stream will also be supported because they are generally derived from `torch.Stream`.

With this PR, we can do like this
```python
s1= torch.Stream()
# Set s1 to the current stream
torch.accelerator.set_stream(s1)
with torch.Stream() as s2:
    # Inside with statement, we set s2 to the current stream
    assert torch.accelerator.current_stream() == s2
# Here the current stream should be s1
assert torch.accelerator.current_stream() == s1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140138
Approved by: https://github.com/albanD
2025-01-10 02:05:19 +00:00
cyy
5008d15ae9 [2/N] Remove usage of C array (#139589)
Follows  #139567
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139589
Approved by: https://github.com/ezyang
2024-11-05 01:58:12 +00:00
456c87c8a2 [8/N] Fix extra warnings brought by clang-tidy-17 (#139151)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139151
Approved by: https://github.com/ezyang

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2024-10-30 14:20:08 +00:00
cyy
3907f36808 Turn some variables and functions into static (#136847)
Re-check some files and mark variables and functions into static and fix other warnings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136847
Approved by: https://github.com/ezyang
2024-10-29 17:01:56 +00:00
cyy
0c0d8c8ff0 [1/N] Fix extra warnings brought by clang-tidy-17 (#137407)
Before we can use clang-tidy-17
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137407
Approved by: https://github.com/Skylion007, https://github.com/aaronenyeshi
2024-10-07 17:53:59 +00:00
8962610247 [BE][clang-format] make macro PyObject_HEAD_INIT(type) and PyVarObject_HEAD_INIT(type, size) have its own line (#136949)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136949
Approved by: https://github.com/albanD, https://github.com/eqy
ghstack dependencies: #136945
2024-10-02 18:39:22 +00:00
8963623494 Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376)
This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.

Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods.

Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed.

Relates #124908
Relates #14560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376
Approved by: https://github.com/albanD
2024-07-23 01:44:15 +00:00
726b9268d2 Revert "Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376)"
This reverts commit c986aeea2d7d9403be702119e3dd4dcb18134fc2.

Reverted https://github.com/pytorch/pytorch/pull/126376 on behalf of https://github.com/atalman due to Failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/126376#issuecomment-2237496633))
2024-07-18 20:25:20 +00:00
c986aeea2d Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376)
This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.

Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods.

Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed.

Relates #124908
Relates #14560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376
Approved by: https://github.com/albanD
2024-07-18 11:54:14 +00:00
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
408aa0182c Build device generic torch.Stream and torch.Event based on c10::Stream/Event (#123611)
This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    @overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    @overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    @classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D56443357](https://our.internmc.facebook.com/intern/diff/D56443357)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123611
Approved by: https://github.com/albanD, https://github.com/jeffdaily
2024-04-24 20:51:17 +00:00
0feab7d6c3 Revert "Build device generic torch.Stream and torch.Event based on c10::Stream/Event (#123611)"
This reverts commit cb17721899d4d6a55d66d4f7188e36c20a078231.

Reverted https://github.com/pytorch/pytorch/pull/123611 on behalf of https://github.com/jeffdaily due to This broke ROCm. see test_overrides.py ([comment](https://github.com/pytorch/pytorch/pull/123611#issuecomment-2067363780))
2024-04-19 22:44:26 +00:00
cb17721899 Build device generic torch.Stream and torch.Event based on c10::Stream/Event (#123611)
This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    @overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    @overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def query(self) -> _bool: ...
    def synchronize(self) -> None: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    @classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123611
Approved by: https://github.com/albanD
2024-04-18 17:35:09 +00:00
cyy
d0ad848aa5 Enable misc clang-tidy checks (#110283)
This PR enables the misc-XX checks in clang-tidy. Meanwhile, I excluded some of them that require a lot of code changes and have no immediate benefits. Some additional fixes and suppression were also given.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110283
Approved by: https://github.com/albanD
2023-09-30 10:39:52 +00:00
cyy
01fc6466d1 [Reland] [1/N] fix clang-tidy warnings in torch/csrc (#108114)
Reland of PR #107648 with auto replaced with Py_ssize_t in eval_frame.c. This PR applies fixes to some found issues by clang-tidy in torch/csrc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108114
Approved by: https://github.com/Skylion007
2023-08-30 17:11:16 +00:00
8cbf77585d Revert "[1/N] fix clang-tidy warnings in torch/csrc (#107648)"
This reverts commit 49eeca00d1e76dd0158758f2c29da6b1d06bf54a.

Reverted https://github.com/pytorch/pytorch/pull/107648 on behalf of https://github.com/osalpekar due to This causes breakages due to underspecified type ([comment](https://github.com/pytorch/pytorch/pull/107648#issuecomment-1696372588))
2023-08-28 20:35:12 +00:00
cyy
49eeca00d1 [1/N] fix clang-tidy warnings in torch/csrc (#107648)
Apply fixes to some found issues by clang-tidy in torch/csrc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107648
Approved by: https://github.com/Skylion007
2023-08-25 00:30:09 +00:00
24bf15fe8d Support record_stream in dispatch mode (#99529)
Summary:
Issuing a `t.record_stream(s)` call while a `TorchDispatchMode` is active was throwing because PyTorch was unable to convert a c10::Stream back to a Python object. It's now fixed.

Fixes https://github.com/pytorch/pytorch/issues/94403

Test Plan: Added a unit test

Differential Revision: D45117566

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99529
Approved by: https://github.com/albanD
2023-04-21 07:17:19 +00:00
cyy
6786a24fd2 fix some tiny code issues (#95757)
This PR tries to fix:
1. a misspelled NDEBUG preprocessing condition.
2. get ride of all writable-strings warnings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95757
Approved by: https://github.com/soulitzer
2023-03-01 23:27:32 +00:00
cyy
27efdc5eed fix writable-strings warnings (#93246)
clang reports "ISO C++11 does not allow conversion from string
literal to 'char *'"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93246
Approved by: https://github.com/malfet
2023-02-04 02:11:15 +00:00
cyy
bfe5e1258b avoid unnecessary static_cast (#93898)
avoid unnecessary static_cast
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93898
Approved by: https://github.com/Skylion007
2023-02-03 03:44:43 +00:00
e096d2db5a [BC-Breaking] Separate stream_id, device_index, and device_type in pack and unpack for Streams (#81596)
#75854

A naive attempt at working around the limitations of using a single 64-bit integer to pack `stream_id`, `device_index`, and `device_type`.

Stills needs sanity checks, testing, and minimization of BC-breaking changes.

Currently a Holder for the `StreamData3` struct is used for `IValue` compatibility. While doing this seems to work for `ivalue.h` and `ivalue_inl.h`, this doesn't seem to be naively working for the JIT CUDA stream wrapper? (Something about ambiguous calls if an `intrusive_ptr` to `c10::ivalue::StreamData3Holder` is used as the return type for `pack()`. It turns out that the methods required to access the fields for rematerializing a CUDA Stream are basically already present anyway, so `pack` is simply removed in the wrapper for now and the methods to access the required fields are called directly.

CC @ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81596
Approved by: https://github.com/ezyang
2023-01-12 14:16:49 +00:00
df69660832 Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552)"" (#82599)
This reverts commit 532b8a9e00d7eea2636e67621bfcfa34d9c85bcb.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599
Approved by: https://github.com/albanD
2022-08-02 19:37:02 +00:00
532b8a9e00 Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552)"
This reverts commit 9465c0e0b50f3c37bc150ef0016238ba33eca6f4.

Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels
2022-08-01 20:25:35 +00:00
9465c0e0b5 Add a lint rule for torch/csrc/util/pybind.h include (#82552)
We define specializations for pybind11 defined templates
(in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently
it is important that these specializations *always* be #include'd
when making use of pybind11 templates whose behavior depends on
these specializations, otherwise we can cause an ODR violation.

The easiest way to ensure that all the specializations are always
loaded is to designate a header (in this case, torch/csrc/util/pybind.h)
that ensures the specializations are defined, and then add a lint
to ensure this header is included whenever pybind11 headers are
included.

The existing grep linter didn't have enough knobs to do this
conveniently, so I added some features.  I'm open to suggestions
for how to structure the features better.  The main changes:

- Added an --allowlist-pattern flag, which turns off the grep lint
  if some other line exists.  This is used to stop the grep
  lint from complaining about pybind11 includes if the util
  include already exists.

- Added --match-first-only flag, which lets grep only match against
  the first matching line.  This is because, even if there are multiple
  includes that are problematic, I only need to fix one of them.
  We don't /really/ need this, but when I was running lintrunner -a
  to fixup the preexisting codebase it was annoying without this,
  as the lintrunner overall driver fails if there are multiple edits
  on the same file.

I excluded any files that didn't otherwise have a dependency on
torch/ATen, this was mostly caffe2 and the valgrind wrapper compat
bindings.

Note the grep replacement is kind of crappy, but clang-tidy lint
cleaned it up in most cases.

See also https://github.com/pybind/pybind11/issues/4099

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552
Approved by: https://github.com/albanD
2022-08-01 17:16:58 +00:00
4b7de26556 Fix C API to be compatible with latest 3.11 beta (#81242)
Based off https://github.com/pytorch/pytorch/pull/80511 with extra changes:
- Update pybind to the latest release as it contains some needed fixes
- Extend the compat header to do reduce changes in code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81242
Approved by: https://github.com/malfet, https://github.com/mattip
2022-07-27 08:37:10 +00:00
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
349f2f767c Modernize to default constructor and nullptr in torch (#61735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61735

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29716659

fbshipit-source-id: ec2a0a0b7e55d2e50b1d35f0b651bd40675ae7e8
2021-07-16 10:51:13 -07:00
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
5741de883a Define the record_stream method in native_functions.yaml (#44301)
Summary:
The record_stream method was hard coded for CUDA device. Define the record_stream in the native_functions.yaml to enable the dynamic dispatch to different end device.

Fixes https://github.com/pytorch/pytorch/issues/36556

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44301

Reviewed By: glaringlee

Differential Revision: D23763954

Pulled By: ezyang

fbshipit-source-id: e6d24f5e7892b56101fa858a6cad2abc5cdc4293
2020-10-13 09:15:22 -07:00