pytorch

frozenleaves/pytorch

Fork 0

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-18 01:15:12 +08:00

Commit Graph

Author	SHA1	Message	Date
Xu Zhao	8d5cceeb6a	[torchbench][optimus] Add backend optimus (#167357 ) Summary: `--optimus [all \| vertical_opt \| horizontal_opt]` will kick off inductor compile with different fusion strategies. Test Plan: TorchBench Runner: ``` $ buck2 run mode/opt //pytorch/benchmark:run -- customized_optimus_illustrative -t train -d cuda GPU Time per batch: 56.254 milliseconds CPU Wall Time per batch: 56.326 milliseconds CPU Wall Time: 56.326 milliseconds Time to first batch: 420.0777 ms GPU 0 Peak Memory: 0.0695 GB CPU Peak Memory: 359.6362 GB ``` PT2 Benchmark Runner (comparing with eager): ``` buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --inductor running benchmark: 100%\|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 30/30 [00:02<00:00, 14.37it/s] 4.509x ``` eager latency: ~56 ms inductor latency: ~11 ms Optimus backend: ``` $ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --optimus all 11.02923508733511 ms, 13.884015614166856 ms, 0.794x ``` ``` $ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --optimus vertical_opt 12.47156853787601 ms, 10.699485195800662 ms, 1.166x ``` ``` $ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --optimus horizontal_opt 11.078484123572707 ms, 10.797873372212052 ms, 1.026x ``` optimus latency ~10 ms Differential Revision: D86524903 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167357 Approved by: https://github.com/mengluy0125	2025-11-11 00:35:30 +00:00

Author

SHA1

Message

Date

Xu Zhao

8d5cceeb6a

[torchbench][optimus] Add backend optimus (#167357 )

Summary: `--optimus [all | vertical_opt | horizontal_opt]` will kick off inductor compile with different fusion strategies.

Test Plan:
TorchBench Runner:

```
$ buck2 run mode/opt //pytorch/benchmark:run -- customized_optimus_illustrative -t train -d cuda
GPU Time per batch:   56.254 milliseconds
CPU Wall Time per batch:  56.326 milliseconds
CPU Wall Time:        56.326 milliseconds
Time to first batch:          420.0777 ms
GPU 0 Peak Memory:              0.0695 GB
CPU Peak Memory:              359.6362 GB
```

PT2 Benchmark Runner (comparing with eager):

```
buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative  --performance --training --inductor

running benchmark: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:02<00:00, 14.37it/s]
4.509x
```

eager latency: ~56 ms
inductor latency: ~11 ms

Optimus backend:

```
$ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --optimus all
11.02923508733511 ms, 13.884015614166856 ms, 0.794x
```

```
$ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --optimus vertical_opt
12.47156853787601 ms, 10.699485195800662 ms, 1.166x
```

```
$ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --optimus horizontal_opt
11.078484123572707 ms, 10.797873372212052 ms, 1.026x
```

optimus latency ~10 ms

Differential Revision: D86524903

Pull Request resolved: https://github.com/pytorch/pytorch/pull/167357
Approved by: https://github.com/mengluy0125

2025-11-11 00:35:30 +00:00

1 Commits