Files
pytorch/c10/core/Event.h
egienvalue 408aa0182c Build device generic torch.Stream and torch.Event based on c10::Stream/Event (#123611)
This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch.
------------
**torch.Stream APIs**
```
# Defined in torch/csrc/Stream.cpp
class Stream(_StreamBase):
    stream_id: _int  # Stream id
    device_index: _int
    device_type: _int

    device: _device  # The device of the stream

    @overload
    def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ...
    @overload
    def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ...
    def wait_event(self, event: Event) -> None: ...
    def wait_stream(self, other: Stream) -> None: ...
    def record_event(self, event: Optional[Event] = None) -> Event: ...
    def query(self) -> None: ...
    def synchronize(self) -> None: ...
    def __hash__(self) -> _int: ...
    def __repr__(self) -> str: ...
    def __eq__(self, other: object) -> _bool: ...
```
------------------
**torch.Event APIs**:
- IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream.
- currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag.
- elapsedTime API is added to c10::Event

```
# Defined in torch/csrc/Event.cpp
class Event(_EventBase):

    device: _device  # The device of the Event
    event_id: _int # The raw event created by device backend

    def __new__(self,
        device: Optional[DeviceLikeType] = None,
        enable_timing: _bool = False,
        blocking: _bool = False,
        interprocess: _bool = False) -> Event: ...
    @classmethod
    def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ...
    def record(self, stream: Optional[Stream] = None) -> None: ...
    def wait(self, stream: Optional[Stream] = None) -> None: ...
    def query(self) -> _bool: ...
    def elapsed_time(self, other: Event) -> _float: ...
    def synchronize(self) -> None: ...
    def ipc_handle(self) -> bytes: ...
    def __repr__(self) -> str: ...
```

-----------

c10::Event provides new APIs
- calculate **elapsedTime**.
- Get raw event id
- Synchronize event.

```
  double elapsedTime(const Event& event) const {
    return impl_.elapsedTime(event.impl_);
  }

  void* eventId() const {
    return impl_.eventId();
  }

  void synchronize() const {
    return impl_.synchronize();
  }
```
----------
TODO: need to find a good way to test them in PyTorch with API mocks.

Differential Revision: [D56443357](https://our.internmc.facebook.com/intern/diff/D56443357)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123611
Approved by: https://github.com/albanD, https://github.com/jeffdaily
2024-04-24 20:51:17 +00:00

138 lines
4.4 KiB
C++

#pragma once
#include <c10/core/Device.h>
#include <c10/core/DeviceType.h>
#include <c10/core/Stream.h>
#include <c10/core/impl/DeviceGuardImplInterface.h>
#include <c10/core/impl/InlineEvent.h>
#include <c10/core/impl/VirtualGuardImpl.h>
namespace c10 {
/**
* A backend-generic movable, not copyable, not thread-safe event.
*
* The design of this event follows that of CUDA and HIP events. These events
* are recorded and waited on by streams and can be rerecorded to,
* each rerecording essentially creating a new version of the event.
* For example, if (in CPU time), stream X is asked to record E,
* stream Y waits on E, and stream X is asked to record E again, then Y will
* wait for X to finish the first call to record and not the second, because
* it's waiting on the first version of event E, not the second.
* Querying an event only returns the status of its most recent version.
*
* Backend-generic events are implemented by this class and
* impl::InlineEvent. In addition to these events there are also
* some backend-specific events, like ATen's CUDAEvent. Each of these
* classes has its own use.
*
* impl::InlineEvent<...> or a backend-specific event should be
* preferred when the backend is known at compile time and known to
* be compiled. Backend-specific events may have additional functionality.
*
* This Event should be used if a particular backend may not be available,
* or the backend required is not known at compile time.
*
* These generic events are built on top of DeviceGuardImpls, analogous
* to DeviceGuard and InlineDeviceGuard. The name "DeviceGuardImpls,"
* is no longer entirely accurate, as these classes implement the
* backend-specific logic for a generic backend interface.
*
* See DeviceGuardImplInterface.h for a list of all supported flags.
*/
struct Event final {
// Constructors
Event() = delete;
Event(
const DeviceType _device_type,
const EventFlag _flag = EventFlag::PYTORCH_DEFAULT)
: impl_{_device_type, _flag} {}
// Copy constructor and copy assignment operator (deleted)
Event(const Event&) = delete;
Event& operator=(const Event&) = delete;
// Move constructor and move assignment operator
Event(Event&&) noexcept = default;
Event& operator=(Event&&) noexcept = default;
// Destructor
~Event() = default;
// Getters
Device device() const noexcept {
return Device(device_type(), device_index());
}
DeviceType device_type() const noexcept {
return impl_.device_type();
}
DeviceIndex device_index() const noexcept {
return impl_.device_index();
}
EventFlag flag() const noexcept {
return impl_.flag();
}
bool was_marked_for_recording() const noexcept {
return impl_.was_marked_for_recording();
}
/**
* Calls record() if and only if record() has never been called for this
* event. Note: because Event is not thread-safe recordOnce() may call
* record() multiple times if called from multiple threads.
*/
void recordOnce(const Stream& stream) {
impl_.recordOnce(stream);
}
/**
* Increments the event's version and enqueues a job with this version
* in the stream's work queue. When the stream process that job
* it notifies all streams waiting on / blocked by that version of the
* event to continue and marks that version as recorded.
* */
void record(const Stream& stream) {
impl_.record(stream);
}
/**
* Does nothing if the event has not been scheduled to be recorded.
* If the event was previously enqueued to be recorded, a command
* to wait for the version of the event that exists at the time of this call
* is inserted in the stream's work queue.
* When the stream reaches this command it will stop processing
* additional commands until that version of the event is marked as recorded.
*/
void block(const Stream& stream) const {
impl_.block(stream);
}
/**
* Returns true if (and only if)
* (1) the event has never been scheduled to be recorded
* (2) the current version is marked as recorded.
* Returns false otherwise.
*/
bool query() const {
return impl_.query();
}
double elapsedTime(const Event& event) const {
return impl_.elapsedTime(event.impl_);
}
void* eventId() const {
return impl_.eventId();
}
void synchronize() const {
return impl_.synchronize();
}
private:
impl::InlineEvent<impl::VirtualGuardImpl> impl_;
};
} // namespace c10