9 Commits

Author SHA1 Message Date
dd3a77bc96 Apply UFMT to all files in benchmarks/ (#105928)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928
Approved by: https://github.com/albanD
2023-07-26 01:18:48 +00:00
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
47bbc01e0b [nnc] Added micro-benchmark to show perf improvement with cat subgraph optimization (#59581)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59581

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28955317

Pulled By: navahgar

fbshipit-source-id: 53bb3dbfafbd3b146063f305523c2e6ec96cf6b8
2021-06-18 14:32:09 -07:00
8af648354f [nnc] Benchmarks for concat (#52592)
Summary:
This PR adds a c++ benchmark for "concat" with 3 different versions - 1) aten::cat, 2) NNC implementation with if-then-else, 3) NNC implementation using multiple loops. It also adds a python benchmark for "concat" which can now be invoked with and without CPU fusion.

Here are the results of these benchmarks on a `Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz` machine with `OMP_NUM_THREADS=1`

```
--------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time           CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------------
Concat2D2 (678fe9f077)Input/ATen/1/160/1/14/1                                         1211 ns       1211 ns     567896 GB/s=1.14953G/s
Concat2D2 (678fe9f077)Input/ATen/1/580/1/174/1                                        1296 ns       1296 ns     537060 GB/s=4.65362G/s
Concat2D2 (678fe9f077)Input/ATen/20/160/20/14/1                                       1823 ns       1823 ns     382052 GB/s=15.2677G/s
Concat2D2 (678fe9f077)Input/ATen/20/580/20/174/1                                      3347 ns       3347 ns     210036 GB/s=36.0432G/s
Concat2D2 (678fe9f077)Input/ATen/8/512/8/512/1                                        2093 ns       2093 ns     324760 GB/s=31.3061G/s
Concat2D2 (678fe9f077)Input/NNC/1/160/1/14/1                                           694 ns        694 ns    1002902 GB/s=2.00692G/s
Concat2D2 (678fe9f077)Input/NNC/1/580/1/174/1                                          852 ns        852 ns     803002 GB/s=7.08127G/s
Concat2D2 (678fe9f077)Input/NNC/20/160/20/14/1                                        1639 ns       1639 ns     419683 GB/s=16.9828G/s
Concat2D2 (678fe9f077)Input/NNC/20/580/20/174/1                                       5956 ns       5956 ns     117833 GB/s=20.2548G/s
Concat2D2 (678fe9f077)Input/NNC/8/512/8/512/1                                         3136 ns       3136 ns     224122 GB/s=20.8958G/s
Concat2D2 (678fe9f077)Input/NNCLoop/1/160/1/14/1                                       581 ns        581 ns    1209873 GB/s=2.39737G/s
Concat2D2 (678fe9f077)Input/NNCLoop/1/580/1/174/1                                      614 ns        614 ns    1132332 GB/s=9.82955G/s
Concat2D2 (678fe9f077)Input/NNCLoop/20/160/20/14/1                                    1091 ns       1091 ns     622952 GB/s=25.5247G/s
Concat2D2 (678fe9f077)Input/NNCLoop/20/580/20/174/1                                   2399 ns       2399 ns     288376 GB/s=50.289G/s
Concat2D2 (678fe9f077)Input/NNCLoop/8/512/8/512/1                                     1500 ns       1500 ns     478360 GB/s=43.6968G/s
Concat2D3 (e23ddf06e9)Input/ATen/8/512/8/512/8/512/1                                  2584 ns       2584 ns     266394 GB/s=38.0397G/s
Concat2D3 (e23ddf06e9)Input/NNC/8/512/8/512/8/512/1                                   5056 ns       5056 ns     139768 GB/s=19.4416G/s
Concat2D3 (e23ddf06e9)Input/NNCLoop/8/512/8/512/8/512/1                               1917 ns       1917 ns     369626 GB/s=51.2758G/s
Concat2D7 (b5edf329f8)Input/ATen/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1          3888 ns       3888 ns     178124 GB/s=46.3571G/s
Concat2D7 (b5edf329f8)Input/NNC/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1          24639 ns      24638 ns      28336 GB/s=7.31481G/s
Concat2D7 (b5edf329f8)Input/NNCLoop/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1       3093 ns       3093 ns     226326 GB/s=58.265G/s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52592

Reviewed By: bertmaher

Differential Revision: D26596701

Pulled By: navahgar

fbshipit-source-id: 650fa88febf4423ea49f5a1d3d734edc2294d257
2021-02-24 06:09:32 -08:00
12d85b536e Fixing Softmax bench. (#51898)
Summary:
Fixes and enables the microbenchmark for Softmax.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51898

Reviewed By: gmagogsfm

Differential Revision: D26333189

Pulled By: navahgar

fbshipit-source-id: be0934e413c4f6728593f896e53a0b31f1657e52
2021-02-09 15:03:49 -08:00
26a91a9f04 [WIP][JIT] Add benchmarking support of NV Fuser with FP16 dtype support (#44101)
Summary:
Modified files in `benchmarks/tensorexpr` to add support for NVIDIA's Fuser for the jit compiler.

This support has some modifications besides adding an option to support the NVIDIA fuser:

* Adds FP16 Datatype support
* Fixes SOL/Algo calculations to generally use the data type instead of being fixed to 4 bytes
* Adds IR printing and kernel printing knobs
* Adds a knob `input_iter` to create ranges of inputs currently only for reductions
* Adds further reduction support for Inner and Outer dimension reductions that are compatible with the `input_iter` knob.
* Added `simple_element`, `reduce2d_inner`, and `reduce2d_outer` to isolate performance on elementwise  and reduction operations in the most minimal fashion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44101

Reviewed By: ngimel

Differential Revision: D23713658

Pulled By: bertmaher

fbshipit-source-id: d6b83cfab559aefe107c23b3c0f2df9923b3adc1
2020-09-15 15:10:49 -07:00
8998a1b3d3 Add tensorexpr benchmarks. (#35064)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35064

Test Plan: Imported from OSS

Differential Revision: D20543695

Pulled By: ZolotukhinM

fbshipit-source-id: 1cf294ab19465cb93557c2b195252c739b40a0f7
2020-03-20 12:01:31 -07:00
976d6aaa51 Revert D20251830: [TensorExpr] Add tensorexpr benchmarks.
Test Plan: revert-hammer

Differential Revision:
D20251830

Original commit changeset: bafd66ce32f6

fbshipit-source-id: d8aea4b26441d8aba90c11d7350d3424df494052
2020-03-16 13:20:16 -07:00
e93e7b2795 [TensorExpr] Add tensorexpr benchmarks. (#34230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34230

This PR adds some benchmarks that we used to assess tensor expressions performance.

Differential Revision: D20251830

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: bafd66ce32f63077e3733112d854f5c750d5b1af
2020-03-16 11:49:39 -07:00