mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch. ------------ **torch.Stream APIs** ``` # Defined in torch/csrc/Stream.cpp class Stream(_StreamBase): stream_id: _int # Stream id device_index: _int device_type: _int device: _device # The device of the stream @overload def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ... @overload def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ... def wait_event(self, event: Event) -> None: ... def wait_stream(self, other: Stream) -> None: ... def record_event(self, event: Optional[Event] = None) -> Event: ... def query(self) -> None: ... def synchronize(self) -> None: ... def __hash__(self) -> _int: ... def __repr__(self) -> str: ... def __eq__(self, other: object) -> _bool: ... ``` ------------------ **torch.Event APIs**: - IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream. - currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag. - elapsedTime API is added to c10::Event ``` # Defined in torch/csrc/Event.cpp class Event(_EventBase): device: _device # The device of the Event event_id: _int # The raw event created by device backend def __new__(self, device: Optional[DeviceLikeType] = None, enable_timing: _bool = False, blocking: _bool = False, interprocess: _bool = False) -> Event: ... @classmethod def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ... def record(self, stream: Optional[Stream] = None) -> None: ... def wait(self, stream: Optional[Stream] = None) -> None: ... def query(self) -> _bool: ... def elapsed_time(self, other: Event) -> _float: ... def synchronize(self) -> None: ... def ipc_handle(self) -> bytes: ... def __repr__(self) -> str: ... ``` ----------- c10::Event provides new APIs - calculate **elapsedTime**. - Get raw event id - Synchronize event. ``` double elapsedTime(const Event& event) const { return impl_.elapsedTime(event.impl_); } void* eventId() const { return impl_.eventId(); } void synchronize() const { return impl_.synchronize(); } ``` ---------- TODO: need to find a good way to test them in PyTorch with API mocks. Differential Revision: [D56443357](https://our.internmc.facebook.com/intern/diff/D56443357) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123611 Approved by: https://github.com/albanD, https://github.com/jeffdaily
138 lines
4.4 KiB
C++
138 lines
4.4 KiB
C++
#pragma once
|
|
|
|
#include <c10/core/Device.h>
|
|
#include <c10/core/DeviceType.h>
|
|
#include <c10/core/Stream.h>
|
|
#include <c10/core/impl/DeviceGuardImplInterface.h>
|
|
#include <c10/core/impl/InlineEvent.h>
|
|
#include <c10/core/impl/VirtualGuardImpl.h>
|
|
|
|
namespace c10 {
|
|
|
|
/**
|
|
* A backend-generic movable, not copyable, not thread-safe event.
|
|
*
|
|
* The design of this event follows that of CUDA and HIP events. These events
|
|
* are recorded and waited on by streams and can be rerecorded to,
|
|
* each rerecording essentially creating a new version of the event.
|
|
* For example, if (in CPU time), stream X is asked to record E,
|
|
* stream Y waits on E, and stream X is asked to record E again, then Y will
|
|
* wait for X to finish the first call to record and not the second, because
|
|
* it's waiting on the first version of event E, not the second.
|
|
* Querying an event only returns the status of its most recent version.
|
|
*
|
|
* Backend-generic events are implemented by this class and
|
|
* impl::InlineEvent. In addition to these events there are also
|
|
* some backend-specific events, like ATen's CUDAEvent. Each of these
|
|
* classes has its own use.
|
|
*
|
|
* impl::InlineEvent<...> or a backend-specific event should be
|
|
* preferred when the backend is known at compile time and known to
|
|
* be compiled. Backend-specific events may have additional functionality.
|
|
*
|
|
* This Event should be used if a particular backend may not be available,
|
|
* or the backend required is not known at compile time.
|
|
*
|
|
* These generic events are built on top of DeviceGuardImpls, analogous
|
|
* to DeviceGuard and InlineDeviceGuard. The name "DeviceGuardImpls,"
|
|
* is no longer entirely accurate, as these classes implement the
|
|
* backend-specific logic for a generic backend interface.
|
|
*
|
|
* See DeviceGuardImplInterface.h for a list of all supported flags.
|
|
*/
|
|
|
|
struct Event final {
|
|
// Constructors
|
|
Event() = delete;
|
|
Event(
|
|
const DeviceType _device_type,
|
|
const EventFlag _flag = EventFlag::PYTORCH_DEFAULT)
|
|
: impl_{_device_type, _flag} {}
|
|
|
|
// Copy constructor and copy assignment operator (deleted)
|
|
Event(const Event&) = delete;
|
|
Event& operator=(const Event&) = delete;
|
|
|
|
// Move constructor and move assignment operator
|
|
Event(Event&&) noexcept = default;
|
|
Event& operator=(Event&&) noexcept = default;
|
|
|
|
// Destructor
|
|
~Event() = default;
|
|
|
|
// Getters
|
|
Device device() const noexcept {
|
|
return Device(device_type(), device_index());
|
|
}
|
|
DeviceType device_type() const noexcept {
|
|
return impl_.device_type();
|
|
}
|
|
DeviceIndex device_index() const noexcept {
|
|
return impl_.device_index();
|
|
}
|
|
EventFlag flag() const noexcept {
|
|
return impl_.flag();
|
|
}
|
|
bool was_marked_for_recording() const noexcept {
|
|
return impl_.was_marked_for_recording();
|
|
}
|
|
|
|
/**
|
|
* Calls record() if and only if record() has never been called for this
|
|
* event. Note: because Event is not thread-safe recordOnce() may call
|
|
* record() multiple times if called from multiple threads.
|
|
*/
|
|
void recordOnce(const Stream& stream) {
|
|
impl_.recordOnce(stream);
|
|
}
|
|
|
|
/**
|
|
* Increments the event's version and enqueues a job with this version
|
|
* in the stream's work queue. When the stream process that job
|
|
* it notifies all streams waiting on / blocked by that version of the
|
|
* event to continue and marks that version as recorded.
|
|
* */
|
|
void record(const Stream& stream) {
|
|
impl_.record(stream);
|
|
}
|
|
|
|
/**
|
|
* Does nothing if the event has not been scheduled to be recorded.
|
|
* If the event was previously enqueued to be recorded, a command
|
|
* to wait for the version of the event that exists at the time of this call
|
|
* is inserted in the stream's work queue.
|
|
* When the stream reaches this command it will stop processing
|
|
* additional commands until that version of the event is marked as recorded.
|
|
*/
|
|
void block(const Stream& stream) const {
|
|
impl_.block(stream);
|
|
}
|
|
|
|
/**
|
|
* Returns true if (and only if)
|
|
* (1) the event has never been scheduled to be recorded
|
|
* (2) the current version is marked as recorded.
|
|
* Returns false otherwise.
|
|
*/
|
|
bool query() const {
|
|
return impl_.query();
|
|
}
|
|
|
|
double elapsedTime(const Event& event) const {
|
|
return impl_.elapsedTime(event.impl_);
|
|
}
|
|
|
|
void* eventId() const {
|
|
return impl_.eventId();
|
|
}
|
|
|
|
void synchronize() const {
|
|
return impl_.synchronize();
|
|
}
|
|
|
|
private:
|
|
impl::InlineEvent<impl::VirtualGuardImpl> impl_;
|
|
};
|
|
|
|
} // namespace c10
|