Commit Graph

11 Commits

Author SHA1 Message Date
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
ac2d2e3a3d Fix some typos.
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561
Approved by: https://github.com/albanD
2022-04-11 21:55:59 +00:00
6208c2800e torch/monitor: merge Interval and FixedCount stats (#72009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72009

This simplifies the Stats interface by merging IntervalStat and FixedCountStat into a single Stat w/ a specific window size duration and an optional max samples per window. This allows for the original intention of having comparably sized windows (for statistical purposes) while also having a consistent output bandwidth.

Test Plan:
```
buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor
```

Reviewed By: kiukchung

Differential Revision: D33822956

fbshipit-source-id: a74782492421be613a1a8b14341b6fb2e8eeb8b4
(cherry picked from commit 293b94e0b4646521ffe047e5222c4bba7e688464)
2022-01-30 23:21:59 +00:00
26d54b4076 monitor: add docstrings to pybind interface (#71481)
Summary:
This adds argument names and docstrings so the docs are a lot more understandable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71481

Test Plan:
docs/tests CI should suffice

![Screenshot 2022-01-19 at 16-35-10 torch monitor — PyTorch master documentation](https://user-images.githubusercontent.com/909104/150240882-e69cfa17-e2be-4569-8ced-71979a89b369.png)

Reviewed By: edward-io

Differential Revision: D33661255

Pulled By: d4l3k

fbshipit-source-id: 686835dfe331b92a51f4409ec37f8ee6211e49d3
(cherry picked from commit 0a6accda1bec839bbc9387d80caa51194e81d828)
2022-01-21 23:04:33 +00:00
bfe1abd3b5 torch/monitor: add pybind (#69567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69567

This exposes torch.monitor events and stats via pybind11 to the underlying C++ implementation.

* The registration interface is a tad different since it takes a lambda function in Python where as in C++ it's a full class.
* This has a small amount of changes to the counter interfaces since there's no way to create an initializer list at runtime so they now also take a vector.
* Only double based stats are provided in Python since it's intended more for high level stats where float imprecision shouldn't be an issue. This can be changed down the line if need arises.

```
events = []

def handler(event):
    events.append(event)

handle = register_event_handler(handler)

log_event(Event(type="torch.monitor.TestEvent", timestamp=datetime.now(), metadata={"foo": 1.0}))
```

D32969391 is now included in this diff.
This cleans up the naming for events. type is now name, message is gone, and metadata is renamed data.

Test Plan: buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor

Reviewed By: kiukchung

Differential Revision: D32924141

fbshipit-source-id: 563304c2e3261a4754e40cca39fc64c5a04b43e8
2022-01-12 13:35:11 -08:00
b4c4a015d6 Revert D33163841: Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers"
Test Plan: revert-hammer

Differential Revision:
D33163841

Original commit changeset: e262b6d8c80a

Original Phabricator Diff: D33102715 (eb374de3f5)

fbshipit-source-id: 644216036a238a458f0a2198460b36d24fb035f8
2021-12-16 11:12:18 -08:00
c80b5b8c8f Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers"
Test Plan: revert-hammer

Differential Revision:
D33102715 (eb374de3f5)

Original commit changeset: 3816ff01c578

Original Phabricator Diff: D33102715 (eb374de3f5)

fbshipit-source-id: e262b6d8c80a05f3a67e024fedfbadefdbfe6e29
2021-12-16 09:39:57 -08:00
eb374de3f5 Back out "Revert D32606547: torch/monitor: add C++ events and handlers" (#69923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69923

Original commit changeset: fbaf2cc06ad4

Original Phabricator Diff: D32606547 (e61fc1c03b)

This is the same thing as the original diff but just using a normal std::mutex instead of std::shared_timed_mutex which is not available on OSX 10.11. The performance difference should be negligible and easy to change down the line if it does become a bottleneck.

Old failing build: https://github.com/pytorch/pytorch/runs/4495465412?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783

Test Plan:
buck test //caffe2/test/cpp/monitor:monitor

will add ciflow tags to ensure mac builds are fine

Reviewed By: aivanou

Differential Revision: D33102715

fbshipit-source-id: 3816ff01c578d8e844d303d881a63cf5c3817bdb
2021-12-15 22:51:43 -08:00
f565167fbd Revert D32606547: torch/monitor: add C++ events and handlers
Test Plan: revert-hammer

Differential Revision:
D32606547 (e61fc1c03b)

Original commit changeset: a00d0364092d

Original Phabricator Diff: D32606547 (e61fc1c03b)

fbshipit-source-id: fbaf2cc06ad4bec606e8a9c6f591d65c04e6fa56
2021-12-11 22:51:03 -08:00
e61fc1c03b torch/monitor: add C++ events and handlers (#68783)
Summary:
This adds a C++ event handler corresponding to the Python one mentioned in the RFC.

This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783

Test Plan: buck test //caffe2/test/cpp/monitor:monitor

Reviewed By: kiukchung

Differential Revision: D32606547

fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead
2021-12-11 16:44:46 -08:00
758d7dea9c torch.monitor - Initial C++ Stats (#68074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074

This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30

This defines the aggregation types, the `Stat` class and provides some simple collection of the stats.

This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance.

Changes:
* added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats.
* This doesn't include the push metrics yet (will be coming).
  After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1).

Performance considerations:
* Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast.
* Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently.
* Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue.

Next steps:
1. Add StatCollector interface for push style metrics
1. Add pybind interfaces to expose to Python
1. Add default metric providers
1. Integrate into Kineto trace view

Test Plan:
buck test //caffe2/test/cpp/monitor:monitor

CI

Reviewed By: kiukchung

Differential Revision: D32266032

fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a
2021-11-18 21:46:23 -08:00