43 Commits

Author SHA1 Message Date
42015db6a9 [BE] fix typos in benchmarks/ (#156077)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156077
Approved by: https://github.com/Skylion007, https://github.com/malfet
ghstack dependencies: #156069
2025-06-17 13:12:18 +00:00
297805fd8f Typo fixes for "overridden" in comments and function names (#155944)
This word appears often in class descriptions and is not consistently spelled. Update comments and some function names to use the correct spelling consistently. Facilitates searching the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155944
Approved by: https://github.com/Skylion007
2025-06-14 03:37:38 +00:00
e2f9759bd0 Fix broken URLs (#152237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237
Approved by: https://github.com/huydhn, https://github.com/malfet
2025-04-27 09:56:42 +00:00
b77406a9ec [BE][CI] bump ruff to 0.8.4 (#143753)
Changes:

1. Bump `ruff` from 0.7.4 to 0.8.4
2. Change `%`-formatted strings to f-string
3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753
Approved by: https://github.com/Skylion007
2024-12-24 12:24:10 +00:00
498a7808ff Fix unused Python variables outside torch/ and test/ (#136359)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359
Approved by: https://github.com/albanD
2024-12-11 17:10:23 +00:00
c0ed38e644 [BE][Easy][3/19] enforce style for empty lines in import segments in benchmarks/ (#129754)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754
Approved by: https://github.com/ezyang
2024-07-17 14:34:42 +00:00
26f4f10ac8 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
55c0ab2887 Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)"
This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f.

Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))
2024-05-27 09:22:08 +00:00
7763c83af6 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
2024-05-27 04:22:18 +00:00
c5fafe9f48 [BE]: TRY002 - Ban raising vanilla exceptions (#124570)
Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR.

I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570
Approved by: https://github.com/ezyang
2024-04-21 22:26:40 +00:00
1d6c5972c1 [BE]: Optimize min/max/sum comprehensions C419 (#123960)
Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied.

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960
Approved by: https://github.com/malfet
2024-04-12 23:54:15 +00:00
bd10fea79a [BE]: Enable F821 and fix bugs (#116579)
Fixes #112371

I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579
Approved by: https://github.com/ezyang
2024-01-01 08:40:46 +00:00
6de28e92d2 [BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027)
This replaces a bunch of unnecessary lambdas with the operator package. This is semantically equivalent, but the operator package is faster, and arguably more readable. When the FURB rules are taken out of preview, I will enable it as a ruff check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116027
Approved by: https://github.com/malfet
2023-12-20 19:35:08 +00:00
b7b2178204 [BE]: Remove useless lambdas (#113602)
Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602
Approved by: https://github.com/albanD
2023-11-14 20:06:48 +00:00
dd3a77bc96 Apply UFMT to all files in benchmarks/ (#105928)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928
Approved by: https://github.com/albanD
2023-07-26 01:18:48 +00:00
5ef023b05a [BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105429
Approved by: https://github.com/malfet
2023-07-19 04:46:37 +00:00
5bbec680d7 Fix usages of contextmanager without finally (#96170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96170
Approved by: https://github.com/ngimel, https://github.com/malfet
2023-03-08 20:59:27 +00:00
8d45f555d7 [BE] [1/3] Rewrite super() calls in caffe2 and benchmarks (#94587)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587
Approved by: https://github.com/ezyang
2023-02-11 18:19:48 +00:00
a229b4526f [BE] Prefer dash over underscore in command-line options (#94505)
Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility.

Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library:

`argparse.BooleanOptionalAction`: 4a9dff0e5a/Lib/argparse.py (L893-L895)

```python
class BooleanOptionalAction(Action):
    def __init__(...):
            if option_string.startswith('--'):
                option_string = '--no-' + option_string[2:]
                _option_strings.append(option_string)
```

It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-09 20:16:49 +00:00
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
ac2d2e3a3d Fix some typos.
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561
Approved by: https://github.com/albanD
2022-04-11 21:55:59 +00:00
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
47bbc01e0b [nnc] Added micro-benchmark to show perf improvement with cat subgraph optimization (#59581)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59581

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28955317

Pulled By: navahgar

fbshipit-source-id: 53bb3dbfafbd3b146063f305523c2e6ec96cf6b8
2021-06-18 14:32:09 -07:00
79a258f448 s/foward/forward/g (#58497)
Summary:
Annoying typo.

Prompted by these profiling results: https://github.com/pytorch/pytorch/issues/56419#issuecomment-825787828

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58497

Reviewed By: malfet

Differential Revision: D28521081

Pulled By: Chillee

fbshipit-source-id: ab91a2e167dd7d3387fd56106a6cff81f7a32f10
2021-05-19 11:42:42 -07:00
d3cde6c23c [NNC] Implementation for aten::cat without conditionals. (#53128)
Summary:
This PR adds an implementation for `aten::cat` in NNC without any conditionals. This version is not enabled by default.

Here is the performance of some micro benchmarks with and without conditionals. There is up to 50% improvement in performance without conditionals for some of the shapes.

aten::cat implementation in NNC **with** conditionals
```
$ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion concat
pt: concat2d2input_fwd_cpu_1_160_1_14_1: 5.44 us, SOL 0.26 GB/s, algorithmic 0.51 GB/s
pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.75 us, SOL 1.05 GB/s, algorithmic 2.10 GB/s
pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.87 us, SOL 4.05 GB/s, algorithmic 8.11 GB/s
pt: concat2d2input_fwd_cpu_20_580_20_174_1: 14.52 us, SOL 8.31 GB/s, algorithmic 16.62 GB/s
pt: concat2d2input_fwd_cpu_8_512_8_512_1: 9.58 us, SOL 6.84 GB/s, algorithmic 13.68 GB/s
```
aten::cat implementation in NNC **without** conditionals
```
$ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion --cat_wo_conditionals concat
pt: concat2d2input_fwd_cpu_1_160_1_14_1: 4.67 us, SOL 0.30 GB/s, algorithmic 0.60 GB/s
pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.65 us, SOL 1.07 GB/s, algorithmic 2.14 GB/s
pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.10 us, SOL 4.56 GB/s, algorithmic 9.12 GB/s
pt: concat2d2input_fwd_cpu_20_580_20_174_1: 7.44 us, SOL 16.22 GB/s, algorithmic 32.44 GB/s
pt: concat2d2input_fwd_cpu_8_512_8_512_1: 6.46 us, SOL 10.14 GB/s, algorithmic 20.29 GB/s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53128

Reviewed By: bertmaher

Differential Revision: D26758613

Pulled By: navahgar

fbshipit-source-id: 00f56b7da630b42bc6e7ddd4444bae0cf3a5780a
2021-03-07 22:57:02 -08:00
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
8af648354f [nnc] Benchmarks for concat (#52592)
Summary:
This PR adds a c++ benchmark for "concat" with 3 different versions - 1) aten::cat, 2) NNC implementation with if-then-else, 3) NNC implementation using multiple loops. It also adds a python benchmark for "concat" which can now be invoked with and without CPU fusion.

Here are the results of these benchmarks on a `Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz` machine with `OMP_NUM_THREADS=1`

```
--------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time           CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------------
Concat2D2 (678fe9f077)Input/ATen/1/160/1/14/1                                         1211 ns       1211 ns     567896 GB/s=1.14953G/s
Concat2D2 (678fe9f077)Input/ATen/1/580/1/174/1                                        1296 ns       1296 ns     537060 GB/s=4.65362G/s
Concat2D2 (678fe9f077)Input/ATen/20/160/20/14/1                                       1823 ns       1823 ns     382052 GB/s=15.2677G/s
Concat2D2 (678fe9f077)Input/ATen/20/580/20/174/1                                      3347 ns       3347 ns     210036 GB/s=36.0432G/s
Concat2D2 (678fe9f077)Input/ATen/8/512/8/512/1                                        2093 ns       2093 ns     324760 GB/s=31.3061G/s
Concat2D2 (678fe9f077)Input/NNC/1/160/1/14/1                                           694 ns        694 ns    1002902 GB/s=2.00692G/s
Concat2D2 (678fe9f077)Input/NNC/1/580/1/174/1                                          852 ns        852 ns     803002 GB/s=7.08127G/s
Concat2D2 (678fe9f077)Input/NNC/20/160/20/14/1                                        1639 ns       1639 ns     419683 GB/s=16.9828G/s
Concat2D2 (678fe9f077)Input/NNC/20/580/20/174/1                                       5956 ns       5956 ns     117833 GB/s=20.2548G/s
Concat2D2 (678fe9f077)Input/NNC/8/512/8/512/1                                         3136 ns       3136 ns     224122 GB/s=20.8958G/s
Concat2D2 (678fe9f077)Input/NNCLoop/1/160/1/14/1                                       581 ns        581 ns    1209873 GB/s=2.39737G/s
Concat2D2 (678fe9f077)Input/NNCLoop/1/580/1/174/1                                      614 ns        614 ns    1132332 GB/s=9.82955G/s
Concat2D2 (678fe9f077)Input/NNCLoop/20/160/20/14/1                                    1091 ns       1091 ns     622952 GB/s=25.5247G/s
Concat2D2 (678fe9f077)Input/NNCLoop/20/580/20/174/1                                   2399 ns       2399 ns     288376 GB/s=50.289G/s
Concat2D2 (678fe9f077)Input/NNCLoop/8/512/8/512/1                                     1500 ns       1500 ns     478360 GB/s=43.6968G/s
Concat2D3 (e23ddf06e9)Input/ATen/8/512/8/512/8/512/1                                  2584 ns       2584 ns     266394 GB/s=38.0397G/s
Concat2D3 (e23ddf06e9)Input/NNC/8/512/8/512/8/512/1                                   5056 ns       5056 ns     139768 GB/s=19.4416G/s
Concat2D3 (e23ddf06e9)Input/NNCLoop/8/512/8/512/8/512/1                               1917 ns       1917 ns     369626 GB/s=51.2758G/s
Concat2D7 (b5edf329f8)Input/ATen/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1          3888 ns       3888 ns     178124 GB/s=46.3571G/s
Concat2D7 (b5edf329f8)Input/NNC/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1          24639 ns      24638 ns      28336 GB/s=7.31481G/s
Concat2D7 (b5edf329f8)Input/NNCLoop/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1       3093 ns       3093 ns     226326 GB/s=58.265G/s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52592

Reviewed By: bertmaher

Differential Revision: D26596701

Pulled By: navahgar

fbshipit-source-id: 650fa88febf4423ea49f5a1d3d734edc2294d257
2021-02-24 06:09:32 -08:00
b6ed05130e Adding a flag to enable CPU fusion in benchmarks (#48612)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48612

Test Plan: python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion element

Reviewed By: heitorschueroff

Differential Revision: D26548643

Pulled By: navahgar

fbshipit-source-id: adb537818d77c9b6b0fe434ae6d963a5f348ad24
2021-02-19 12:11:06 -08:00
12d85b536e Fixing Softmax bench. (#51898)
Summary:
Fixes and enables the microbenchmark for Softmax.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51898

Reviewed By: gmagogsfm

Differential Revision: D26333189

Pulled By: navahgar

fbshipit-source-id: be0934e413c4f6728593f896e53a0b31f1657e52
2021-02-09 15:03:49 -08:00
9920ae665b Make te a hidden package for now (#51690)
Summary:
As discussed with suo , having it in `torch._C.XX` means that it automatically gets added to `torch.XX` which is unfortunate. Making it `torch._C._XX` means that it won't be added to `torch.`.

Let me know if that approach to hide it is not good and we can update that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51690

Reviewed By: gchanan

Differential Revision: D26243207

Pulled By: albanD

fbshipit-source-id: 3eb91a96635e90a6b98df799e3a732833dd280d5
2021-02-04 07:58:38 -08:00
4cca08368b Adds per-op microbenchmarks for NNC (#50845)
Summary:
Runs through vast majority of primitive ops that exist in NNC and benchmarks them against PyTorch ops on CPU. Dumps out a plot like this.

![nnc](https://user-images.githubusercontent.com/6355099/105247994-a854d380-5b43-11eb-9ac9-1ee779e5ab54.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50845

Reviewed By: ngimel

Differential Revision: D25989080

Pulled By: Chillee

fbshipit-source-id: 6d6a39eb06b3de9a999993224d5e718537c0c8c4
2021-01-21 13:21:01 -08:00
88b36230f5 Add full reduction benchmark. (#50057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50057

As part of the effort to calibrate TE reduction performance, adding a full reduction benchmark.
Also add a "skip_input_transformation" option.
Fixed other reduction benchmarks to accept specific benchmarks that was listed.

Test plans:
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full_fwd_cpu_16777216_s1
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full_fwd_cpu_16777216_s0
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_inner
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_inner_fwd_cpu_640_524288
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_outer
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_outer_fwd_cpu_640_524288

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D25774138

Pulled By: zheng-xq

fbshipit-source-id: fd4598e5c29991be476e42235a059e8021d4f083
2021-01-21 09:56:46 -08:00
56a3831bc6 [NVFuser]Benchmark minor update (#46778)
Summary:
This is a tiny PR for two minor fixes:

1. Added `torch._C._jit_set_texpr_fuser_enabled(False)` to enable shape inference on nv fuser runs.
2. Renamed dynamic benchmark module names to avoid multiple matching. i.e. `simple_element` with `dynamic_simple_element`. I guess it'd be much easier if the pattern matching was based on `startswith`. Would be happy to update that if agreed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46778

Reviewed By: zhangguanheng66

Differential Revision: D24516911

Pulled By: bertmaher

fbshipit-source-id: 839f9a3e058f9d7aca17b2e6eb8b558e0e48e8f4
2020-10-26 12:22:36 -07:00
43fe45ab0f [JIT] Add dynamic shape benchmark for NV Fuser (#46107)
Summary:
This PR modifies `benchmarks/tensorexpr`. It follows up[ https://github.com/pytorch/pytorch/issues/44101](https://github.com/pytorch/pytorch/pull/44101) and further supports characterizing fusers with dynamic shape benchmarks. Dynamic shape condition models the use case when the input tensor shape changes in each call to the graph.

Changes include:

Added an auxiliary class `DynamicShape `that provides a simple API for enabling dynamic shapes in existing test cases, example can be found with `DynamicSimpleElementBench`

Created new bench_cls: `DynamicSimpleElementBench`, `DynamicReduce2DInnerBench`, `DynamicReduce2DOuterBench`, and `DynamicLSTM`. They are all dynamic shaped versions of existing benchmarks and examples of enabling dynamic shape with `DynamicShape`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46107

Reviewed By: glaringlee

Differential Revision: D24229400

Pulled By: bertmaher

fbshipit-source-id: 889fece5ea87d0f6f6374d31dbe11b1cd1380683
2020-10-09 22:09:21 -07:00
26a91a9f04 [WIP][JIT] Add benchmarking support of NV Fuser with FP16 dtype support (#44101)
Summary:
Modified files in `benchmarks/tensorexpr` to add support for NVIDIA's Fuser for the jit compiler.

This support has some modifications besides adding an option to support the NVIDIA fuser:

* Adds FP16 Datatype support
* Fixes SOL/Algo calculations to generally use the data type instead of being fixed to 4 bytes
* Adds IR printing and kernel printing knobs
* Adds a knob `input_iter` to create ranges of inputs currently only for reductions
* Adds further reduction support for Inner and Outer dimension reductions that are compatible with the `input_iter` knob.
* Added `simple_element`, `reduce2d_inner`, and `reduce2d_outer` to isolate performance on elementwise  and reduction operations in the most minimal fashion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44101

Reviewed By: ngimel

Differential Revision: D23713658

Pulled By: bertmaher

fbshipit-source-id: d6b83cfab559aefe107c23b3c0f2df9923b3adc1
2020-09-15 15:10:49 -07:00
33d51a9b32 Respect canFuseOn{CPU,GPU} in TE fuser (#43967)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967

Test Plan: Imported from OSS

Reviewed By: asuhan

Differential Revision: D23469048

Pulled By: bertmaher

fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb
2020-09-02 18:00:25 -07:00
b8ae563ce6 Add a microbenchmark for LSTM elementwise portion (#42901)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42901

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23079714

Pulled By: bertmaher

fbshipit-source-id: 28f8c3b5019ee898e82e64a0a674da1b4736d252
2020-08-12 17:11:47 -07:00
33d209b5f4 Fix TE microbenchmark harness to use appropriate fuser/executor (#42900)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42900

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23079715

Pulled By: bertmaher

fbshipit-source-id: 6aa2b08a550835b7737e355960a16a7ca83878ea
2020-08-12 17:11:44 -07:00
9fe3b1857d [TensorExpr] Fix imports in tensorexpr benchmarks. (#35830)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35830

Test Plan: Imported from OSS

Differential Revision: D20799464

Pulled By: ZolotukhinM

fbshipit-source-id: 1b5981ad15042f601a9b6eb01a799cdf71200666
2020-04-01 14:23:33 -07:00
a3e10d2a17 Expose enablement of TensorExpr fuser as env variable (#35341)
Summary:
This commit allows one to use an environment variable to enable the fuser in torch/csrc/jit/tensorexpr/

```
PYTORCH_TENSOREXPR=1 python benchmark.py
```

This commit also changes the registration to happen by default, removing the requirement for the python exposed "_jit_register_tensorexpr_fuser"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35341

Reviewed By: ZolotukhinM

Differential Revision: D20676348

Pulled By: bwasti

fbshipit-source-id: 4c997cdc310e7567c03905ebff72b3e8a4c2f464
2020-03-26 14:31:57 -07:00
8998a1b3d3 Add tensorexpr benchmarks. (#35064)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35064

Test Plan: Imported from OSS

Differential Revision: D20543695

Pulled By: ZolotukhinM

fbshipit-source-id: 1cf294ab19465cb93557c2b195252c739b40a0f7
2020-03-20 12:01:31 -07:00
976d6aaa51 Revert D20251830: [TensorExpr] Add tensorexpr benchmarks.
Test Plan: revert-hammer

Differential Revision:
D20251830

Original commit changeset: bafd66ce32f6

fbshipit-source-id: d8aea4b26441d8aba90c11d7350d3424df494052
2020-03-16 13:20:16 -07:00
e93e7b2795 [TensorExpr] Add tensorexpr benchmarks. (#34230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34230

This PR adds some benchmarks that we used to assess tensor expressions performance.

Differential Revision: D20251830

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: bafd66ce32f63077e3733112d854f5c750d5b1af
2020-03-16 11:49:39 -07:00