dd3a77bc96
Apply UFMT to all files in benchmarks/ ( #105928 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928
Approved by: https://github.com/albanD
2023-07-26 01:18:48 +00:00
8fce9a09cd
[BE]: pyupgrade Python to 3.8 - imports and object inheritance only ( #94308 )
...
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang , https://github.com/albanD
2023-02-07 21:10:56 +00:00
47bbc01e0b
[nnc] Added micro-benchmark to show perf improvement with cat subgraph optimization ( #59581 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59581
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D28955317
Pulled By: navahgar
fbshipit-source-id: 53bb3dbfafbd3b146063f305523c2e6ec96cf6b8
2021-06-18 14:32:09 -07:00
8af648354f
[nnc] Benchmarks for concat ( #52592 )
...
Summary:
This PR adds a c++ benchmark for "concat" with 3 different versions - 1) aten::cat, 2) NNC implementation with if-then-else, 3) NNC implementation using multiple loops. It also adds a python benchmark for "concat" which can now be invoked with and without CPU fusion.
Here are the results of these benchmarks on a `Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz` machine with `OMP_NUM_THREADS=1`
```
--------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------------
Concat2D2 (678fe9f077
)Input/ATen/1/160/1/14/1 1211 ns 1211 ns 567896 GB/s=1.14953G/s
Concat2D2 (678fe9f077
)Input/ATen/1/580/1/174/1 1296 ns 1296 ns 537060 GB/s=4.65362G/s
Concat2D2 (678fe9f077
)Input/ATen/20/160/20/14/1 1823 ns 1823 ns 382052 GB/s=15.2677G/s
Concat2D2 (678fe9f077
)Input/ATen/20/580/20/174/1 3347 ns 3347 ns 210036 GB/s=36.0432G/s
Concat2D2 (678fe9f077
)Input/ATen/8/512/8/512/1 2093 ns 2093 ns 324760 GB/s=31.3061G/s
Concat2D2 (678fe9f077
)Input/NNC/1/160/1/14/1 694 ns 694 ns 1002902 GB/s=2.00692G/s
Concat2D2 (678fe9f077
)Input/NNC/1/580/1/174/1 852 ns 852 ns 803002 GB/s=7.08127G/s
Concat2D2 (678fe9f077
)Input/NNC/20/160/20/14/1 1639 ns 1639 ns 419683 GB/s=16.9828G/s
Concat2D2 (678fe9f077
)Input/NNC/20/580/20/174/1 5956 ns 5956 ns 117833 GB/s=20.2548G/s
Concat2D2 (678fe9f077
)Input/NNC/8/512/8/512/1 3136 ns 3136 ns 224122 GB/s=20.8958G/s
Concat2D2 (678fe9f077
)Input/NNCLoop/1/160/1/14/1 581 ns 581 ns 1209873 GB/s=2.39737G/s
Concat2D2 (678fe9f077
)Input/NNCLoop/1/580/1/174/1 614 ns 614 ns 1132332 GB/s=9.82955G/s
Concat2D2 (678fe9f077
)Input/NNCLoop/20/160/20/14/1 1091 ns 1091 ns 622952 GB/s=25.5247G/s
Concat2D2 (678fe9f077
)Input/NNCLoop/20/580/20/174/1 2399 ns 2399 ns 288376 GB/s=50.289G/s
Concat2D2 (678fe9f077
)Input/NNCLoop/8/512/8/512/1 1500 ns 1500 ns 478360 GB/s=43.6968G/s
Concat2D3 (e23ddf06e9
)Input/ATen/8/512/8/512/8/512/1 2584 ns 2584 ns 266394 GB/s=38.0397G/s
Concat2D3 (e23ddf06e9
)Input/NNC/8/512/8/512/8/512/1 5056 ns 5056 ns 139768 GB/s=19.4416G/s
Concat2D3 (e23ddf06e9
)Input/NNCLoop/8/512/8/512/8/512/1 1917 ns 1917 ns 369626 GB/s=51.2758G/s
Concat2D7 (b5edf329f8
)Input/ATen/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1 3888 ns 3888 ns 178124 GB/s=46.3571G/s
Concat2D7 (b5edf329f8
)Input/NNC/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1 24639 ns 24638 ns 28336 GB/s=7.31481G/s
Concat2D7 (b5edf329f8
)Input/NNCLoop/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1 3093 ns 3093 ns 226326 GB/s=58.265G/s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52592
Reviewed By: bertmaher
Differential Revision: D26596701
Pulled By: navahgar
fbshipit-source-id: 650fa88febf4423ea49f5a1d3d734edc2294d257
2021-02-24 06:09:32 -08:00
12d85b536e
Fixing Softmax bench. ( #51898 )
...
Summary:
Fixes and enables the microbenchmark for Softmax.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51898
Reviewed By: gmagogsfm
Differential Revision: D26333189
Pulled By: navahgar
fbshipit-source-id: be0934e413c4f6728593f896e53a0b31f1657e52
2021-02-09 15:03:49 -08:00
26a91a9f04
[WIP][JIT] Add benchmarking support of NV Fuser with FP16 dtype support ( #44101 )
...
Summary:
Modified files in `benchmarks/tensorexpr` to add support for NVIDIA's Fuser for the jit compiler.
This support has some modifications besides adding an option to support the NVIDIA fuser:
* Adds FP16 Datatype support
* Fixes SOL/Algo calculations to generally use the data type instead of being fixed to 4 bytes
* Adds IR printing and kernel printing knobs
* Adds a knob `input_iter` to create ranges of inputs currently only for reductions
* Adds further reduction support for Inner and Outer dimension reductions that are compatible with the `input_iter` knob.
* Added `simple_element`, `reduce2d_inner`, and `reduce2d_outer` to isolate performance on elementwise and reduction operations in the most minimal fashion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44101
Reviewed By: ngimel
Differential Revision: D23713658
Pulled By: bertmaher
fbshipit-source-id: d6b83cfab559aefe107c23b3c0f2df9923b3adc1
2020-09-15 15:10:49 -07:00
8998a1b3d3
Add tensorexpr benchmarks. ( #35064 )
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35064
Test Plan: Imported from OSS
Differential Revision: D20543695
Pulled By: ZolotukhinM
fbshipit-source-id: 1cf294ab19465cb93557c2b195252c739b40a0f7
2020-03-20 12:01:31 -07:00
976d6aaa51
Revert D20251830: [TensorExpr] Add tensorexpr benchmarks.
...
Test Plan: revert-hammer
Differential Revision:
D20251830
Original commit changeset: bafd66ce32f6
fbshipit-source-id: d8aea4b26441d8aba90c11d7350d3424df494052
2020-03-16 13:20:16 -07:00
e93e7b2795
[TensorExpr] Add tensorexpr benchmarks. ( #34230 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34230
This PR adds some benchmarks that we used to assess tensor expressions performance.
Differential Revision: D20251830
Test Plan: Imported from OSS
Pulled By: ZolotukhinM
fbshipit-source-id: bafd66ce32f63077e3733112d854f5c750d5b1af
2020-03-16 11:49:39 -07:00