pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Xuehai Pan	42015db6a9	[BE] fix typos in benchmarks/ (#156077 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156077 Approved by: https://github.com/Skylion007, https://github.com/malfet ghstack dependencies: #156069	2025-06-17 13:12:18 +00:00
Sean McGovern	297805fd8f	Typo fixes for "overridden" in comments and function names (#155944 ) This word appears often in class descriptions and is not consistently spelled. Update comments and some function names to use the correct spelling consistently. Facilitates searching the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155944 Approved by: https://github.com/Skylion007	2025-06-14 03:37:38 +00:00
Anthony Shoumikhin	e2f9759bd0	Fix broken URLs (#152237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237 Approved by: https://github.com/huydhn, https://github.com/malfet	2025-04-27 09:56:42 +00:00
Xuehai Pan	b77406a9ec	[BE][CI] bump `ruff` to 0.8.4 (#143753 ) Changes: 1. Bump `ruff` from 0.7.4 to 0.8.4 2. Change `%`-formatted strings to f-string 3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753 Approved by: https://github.com/Skylion007	2024-12-24 12:24:10 +00:00
Tom Ritchford	498a7808ff	Fix unused Python variables outside torch/ and test/ (#136359 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359 Approved by: https://github.com/albanD	2024-12-11 17:10:23 +00:00
Xuehai Pan	c0ed38e644	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754 Approved by: https://github.com/ezyang	2024-07-17 14:34:42 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
Aaron Gokaslan	c5fafe9f48	[BE]: TRY002 - Ban raising vanilla exceptions (#124570 ) Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR. I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570 Approved by: https://github.com/ezyang	2024-04-21 22:26:40 +00:00
Aaron Gokaslan	1d6c5972c1	[BE]: Optimize min/max/sum comprehensions C419 (#123960 ) Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied. Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960 Approved by: https://github.com/malfet	2024-04-12 23:54:15 +00:00
Aaron Gokaslan	bd10fea79a	[BE]: Enable F821 and fix bugs (#116579 ) Fixes #112371 I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579 Approved by: https://github.com/ezyang	2024-01-01 08:40:46 +00:00
Aaron Gokaslan	6de28e92d2	[BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027 ) This replaces a bunch of unnecessary lambdas with the operator package. This is semantically equivalent, but the operator package is faster, and arguably more readable. When the FURB rules are taken out of preview, I will enable it as a ruff check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116027 Approved by: https://github.com/malfet	2023-12-20 19:35:08 +00:00
Aaron Gokaslan	b7b2178204	[BE]: Remove useless lambdas (#113602 ) Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602 Approved by: https://github.com/albanD	2023-11-14 20:06:48 +00:00
Edward Z. Yang	dd3a77bc96	Apply UFMT to all files in benchmarks/ (#105928 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928 Approved by: https://github.com/albanD	2023-07-26 01:18:48 +00:00
Justin Chu	5ef023b05a	[BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105429 Approved by: https://github.com/malfet	2023-07-19 04:46:37 +00:00
Horace He	5bbec680d7	Fix usages of contextmanager without finally (#96170 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96170 Approved by: https://github.com/ngimel, https://github.com/malfet	2023-03-08 20:59:27 +00:00
Xuehai Pan	8d45f555d7	[BE] [1/3] Rewrite `super()` calls in caffe2 and benchmarks (#94587 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587 Approved by: https://github.com/ezyang	2023-02-11 18:19:48 +00:00
Xuehai Pan	a229b4526f	[BE] Prefer dash over underscore in command-line options (#94505 ) Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility. Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library: `argparse.BooleanOptionalAction`: `4a9dff0e5a/Lib/argparse.py (L893-L895)` ```python class BooleanOptionalAction(Action): def __init__(...): if option_string.startswith('--'): option_string = '--no-' + option_string[2:] _option_strings.append(option_string) ``` It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-09 20:16:49 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Yulv-git	ac2d2e3a3d	Fix some typos. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561 Approved by: https://github.com/albanD	2022-04-11 21:55:59 +00:00
Elias Ellison	6694fdaccd	Clean up profiling mode and profiling executor strategy (#73875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875 Previously we had a few settings: - getExecutor - which toggled between Profiling Executor and Legacy - getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations) and then... - getProfilingMode - which would set PE to 0 specializtions. The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93. The tests here are failing but get fixed with the PR above it, so i'll squash for landing. Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D34938130 Pulled By: eellison fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b (cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)	2022-03-29 18:38:51 +00:00
Raghavan Raman	47bbc01e0b	[nnc] Added micro-benchmark to show perf improvement with cat subgraph optimization (#59581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59581 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28955317 Pulled By: navahgar fbshipit-source-id: 53bb3dbfafbd3b146063f305523c2e6ec96cf6b8	2021-06-18 14:32:09 -07:00
Horace He	79a258f448	s/foward/forward/g (#58497 ) Summary: Annoying typo. Prompted by these profiling results: https://github.com/pytorch/pytorch/issues/56419#issuecomment-825787828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58497 Reviewed By: malfet Differential Revision: D28521081 Pulled By: Chillee fbshipit-source-id: ab91a2e167dd7d3387fd56106a6cff81f7a32f10	2021-05-19 11:42:42 -07:00
Raghavan Raman	d3cde6c23c	[NNC] Implementation for aten::cat without conditionals. (#53128 ) Summary: This PR adds an implementation for `aten::cat` in NNC without any conditionals. This version is not enabled by default. Here is the performance of some micro benchmarks with and without conditionals. There is up to 50% improvement in performance without conditionals for some of the shapes. aten::cat implementation in NNC with conditionals ``` $ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion concat pt: concat2d2input_fwd_cpu_1_160_1_14_1: 5.44 us, SOL 0.26 GB/s, algorithmic 0.51 GB/s pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.75 us, SOL 1.05 GB/s, algorithmic 2.10 GB/s pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.87 us, SOL 4.05 GB/s, algorithmic 8.11 GB/s pt: concat2d2input_fwd_cpu_20_580_20_174_1: 14.52 us, SOL 8.31 GB/s, algorithmic 16.62 GB/s pt: concat2d2input_fwd_cpu_8_512_8_512_1: 9.58 us, SOL 6.84 GB/s, algorithmic 13.68 GB/s ``` aten::cat implementation in NNC without conditionals ``` $ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion --cat_wo_conditionals concat pt: concat2d2input_fwd_cpu_1_160_1_14_1: 4.67 us, SOL 0.30 GB/s, algorithmic 0.60 GB/s pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.65 us, SOL 1.07 GB/s, algorithmic 2.14 GB/s pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.10 us, SOL 4.56 GB/s, algorithmic 9.12 GB/s pt: concat2d2input_fwd_cpu_20_580_20_174_1: 7.44 us, SOL 16.22 GB/s, algorithmic 32.44 GB/s pt: concat2d2input_fwd_cpu_8_512_8_512_1: 6.46 us, SOL 10.14 GB/s, algorithmic 20.29 GB/s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53128 Reviewed By: bertmaher Differential Revision: D26758613 Pulled By: navahgar fbshipit-source-id: 00f56b7da630b42bc6e7ddd4444bae0cf3a5780a	2021-03-07 22:57:02 -08:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Raghavan Raman	8af648354f	[nnc] Benchmarks for concat (#52592 ) Summary: This PR adds a c++ benchmark for "concat" with 3 different versions - 1) aten::cat, 2) NNC implementation with if-then-else, 3) NNC implementation using multiple loops. It also adds a python benchmark for "concat" which can now be invoked with and without CPU fusion. Here are the results of these benchmarks on a `Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz` machine with `OMP_NUM_THREADS=1` ``` -------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------------- Concat2D2 (`678fe9f077`)Input/ATen/1/160/1/14/1 1211 ns 1211 ns 567896 GB/s=1.14953G/s Concat2D2 (`678fe9f077`)Input/ATen/1/580/1/174/1 1296 ns 1296 ns 537060 GB/s=4.65362G/s Concat2D2 (`678fe9f077`)Input/ATen/20/160/20/14/1 1823 ns 1823 ns 382052 GB/s=15.2677G/s Concat2D2 (`678fe9f077`)Input/ATen/20/580/20/174/1 3347 ns 3347 ns 210036 GB/s=36.0432G/s Concat2D2 (`678fe9f077`)Input/ATen/8/512/8/512/1 2093 ns 2093 ns 324760 GB/s=31.3061G/s Concat2D2 (`678fe9f077`)Input/NNC/1/160/1/14/1 694 ns 694 ns 1002902 GB/s=2.00692G/s Concat2D2 (`678fe9f077`)Input/NNC/1/580/1/174/1 852 ns 852 ns 803002 GB/s=7.08127G/s Concat2D2 (`678fe9f077`)Input/NNC/20/160/20/14/1 1639 ns 1639 ns 419683 GB/s=16.9828G/s Concat2D2 (`678fe9f077`)Input/NNC/20/580/20/174/1 5956 ns 5956 ns 117833 GB/s=20.2548G/s Concat2D2 (`678fe9f077`)Input/NNC/8/512/8/512/1 3136 ns 3136 ns 224122 GB/s=20.8958G/s Concat2D2 (`678fe9f077`)Input/NNCLoop/1/160/1/14/1 581 ns 581 ns 1209873 GB/s=2.39737G/s Concat2D2 (`678fe9f077`)Input/NNCLoop/1/580/1/174/1 614 ns 614 ns 1132332 GB/s=9.82955G/s Concat2D2 (`678fe9f077`)Input/NNCLoop/20/160/20/14/1 1091 ns 1091 ns 622952 GB/s=25.5247G/s Concat2D2 (`678fe9f077`)Input/NNCLoop/20/580/20/174/1 2399 ns 2399 ns 288376 GB/s=50.289G/s Concat2D2 (`678fe9f077`)Input/NNCLoop/8/512/8/512/1 1500 ns 1500 ns 478360 GB/s=43.6968G/s Concat2D3 (`e23ddf06e9`)Input/ATen/8/512/8/512/8/512/1 2584 ns 2584 ns 266394 GB/s=38.0397G/s Concat2D3 (`e23ddf06e9`)Input/NNC/8/512/8/512/8/512/1 5056 ns 5056 ns 139768 GB/s=19.4416G/s Concat2D3 (`e23ddf06e9`)Input/NNCLoop/8/512/8/512/8/512/1 1917 ns 1917 ns 369626 GB/s=51.2758G/s Concat2D7 (`b5edf329f8`)Input/ATen/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1 3888 ns 3888 ns 178124 GB/s=46.3571G/s Concat2D7 (`b5edf329f8`)Input/NNC/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1 24639 ns 24638 ns 28336 GB/s=7.31481G/s Concat2D7 (`b5edf329f8`)Input/NNCLoop/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1 3093 ns 3093 ns 226326 GB/s=58.265G/s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52592 Reviewed By: bertmaher Differential Revision: D26596701 Pulled By: navahgar fbshipit-source-id: 650fa88febf4423ea49f5a1d3d734edc2294d257	2021-02-24 06:09:32 -08:00
Raghavan Raman	b6ed05130e	Adding a flag to enable CPU fusion in benchmarks (#48612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48612 Test Plan: python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion element Reviewed By: heitorschueroff Differential Revision: D26548643 Pulled By: navahgar fbshipit-source-id: adb537818d77c9b6b0fe434ae6d963a5f348ad24	2021-02-19 12:11:06 -08:00
Raghavan Raman	12d85b536e	Fixing Softmax bench. (#51898 ) Summary: Fixes and enables the microbenchmark for Softmax. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51898 Reviewed By: gmagogsfm Differential Revision: D26333189 Pulled By: navahgar fbshipit-source-id: be0934e413c4f6728593f896e53a0b31f1657e52	2021-02-09 15:03:49 -08:00
albanD	9920ae665b	Make te a hidden package for now (#51690 ) Summary: As discussed with suo , having it in `torch._C.XX` means that it automatically gets added to `torch.XX` which is unfortunate. Making it `torch._C._XX` means that it won't be added to `torch.`. Let me know if that approach to hide it is not good and we can update that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51690 Reviewed By: gchanan Differential Revision: D26243207 Pulled By: albanD fbshipit-source-id: 3eb91a96635e90a6b98df799e3a732833dd280d5	2021-02-04 07:58:38 -08:00
Horace He	4cca08368b	Adds per-op microbenchmarks for NNC (#50845 ) Summary: Runs through vast majority of primitive ops that exist in NNC and benchmarks them against PyTorch ops on CPU. Dumps out a plot like this. ![nnc](https://user-images.githubusercontent.com/6355099/105247994-a854d380-5b43-11eb-9ac9-1ee779e5ab54.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50845 Reviewed By: ngimel Differential Revision: D25989080 Pulled By: Chillee fbshipit-source-id: 6d6a39eb06b3de9a999993224d5e718537c0c8c4	2021-01-21 13:21:01 -08:00
Xiaoqiang Zheng	88b36230f5	Add full reduction benchmark. (#50057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50057 As part of the effort to calibrate TE reduction performance, adding a full reduction benchmark. Also add a "skip_input_transformation" option. Fixed other reduction benchmarks to accept specific benchmarks that was listed. Test plans: * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full_fwd_cpu_16777216_s1 * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full_fwd_cpu_16777216_s0 * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_inner * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_inner_fwd_cpu_640_524288 * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_outer * python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_outer_fwd_cpu_640_524288 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25774138 Pulled By: zheng-xq fbshipit-source-id: fd4598e5c29991be476e42235a059e8021d4f083	2021-01-21 09:56:46 -08:00
shmsong	56a3831bc6	[NVFuser]Benchmark minor update (#46778 ) Summary: This is a tiny PR for two minor fixes: 1. Added `torch._C._jit_set_texpr_fuser_enabled(False)` to enable shape inference on nv fuser runs. 2. Renamed dynamic benchmark module names to avoid multiple matching. i.e. `simple_element` with `dynamic_simple_element`. I guess it'd be much easier if the pattern matching was based on `startswith`. Would be happy to update that if agreed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46778 Reviewed By: zhangguanheng66 Differential Revision: D24516911 Pulled By: bertmaher fbshipit-source-id: 839f9a3e058f9d7aca17b2e6eb8b558e0e48e8f4	2020-10-26 12:22:36 -07:00
shmsong	43fe45ab0f	[JIT] Add dynamic shape benchmark for NV Fuser (#46107 ) Summary: This PR modifies `benchmarks/tensorexpr`. It follows up[ https://github.com/pytorch/pytorch/issues/44101](https://github.com/pytorch/pytorch/pull/44101) and further supports characterizing fusers with dynamic shape benchmarks. Dynamic shape condition models the use case when the input tensor shape changes in each call to the graph. Changes include: Added an auxiliary class `DynamicShape `that provides a simple API for enabling dynamic shapes in existing test cases, example can be found with `DynamicSimpleElementBench` Created new bench_cls: `DynamicSimpleElementBench`, `DynamicReduce2DInnerBench`, `DynamicReduce2DOuterBench`, and `DynamicLSTM`. They are all dynamic shaped versions of existing benchmarks and examples of enabling dynamic shape with `DynamicShape`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46107 Reviewed By: glaringlee Differential Revision: D24229400 Pulled By: bertmaher fbshipit-source-id: 889fece5ea87d0f6f6374d31dbe11b1cd1380683	2020-10-09 22:09:21 -07:00
Kevin Stephano	26a91a9f04	[WIP][JIT] Add benchmarking support of NV Fuser with FP16 dtype support (#44101 ) Summary: Modified files in `benchmarks/tensorexpr` to add support for NVIDIA's Fuser for the jit compiler. This support has some modifications besides adding an option to support the NVIDIA fuser: * Adds FP16 Datatype support * Fixes SOL/Algo calculations to generally use the data type instead of being fixed to 4 bytes * Adds IR printing and kernel printing knobs * Adds a knob `input_iter` to create ranges of inputs currently only for reductions * Adds further reduction support for Inner and Outer dimension reductions that are compatible with the `input_iter` knob. * Added `simple_element`, `reduce2d_inner`, and `reduce2d_outer` to isolate performance on elementwise and reduction operations in the most minimal fashion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44101 Reviewed By: ngimel Differential Revision: D23713658 Pulled By: bertmaher fbshipit-source-id: d6b83cfab559aefe107c23b3c0f2df9923b3adc1	2020-09-15 15:10:49 -07:00
Bert Maher	33d51a9b32	Respect canFuseOn{CPU,GPU} in TE fuser (#43967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D23469048 Pulled By: bertmaher fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb	2020-09-02 18:00:25 -07:00
Bert Maher	b8ae563ce6	Add a microbenchmark for LSTM elementwise portion (#42901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42901 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23079714 Pulled By: bertmaher fbshipit-source-id: 28f8c3b5019ee898e82e64a0a674da1b4736d252	2020-08-12 17:11:47 -07:00
Bert Maher	33d209b5f4	Fix TE microbenchmark harness to use appropriate fuser/executor (#42900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42900 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23079715 Pulled By: bertmaher fbshipit-source-id: 6aa2b08a550835b7737e355960a16a7ca83878ea	2020-08-12 17:11:44 -07:00
Mikhail Zolotukhin	9fe3b1857d	[TensorExpr] Fix imports in tensorexpr benchmarks. (#35830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35830 Test Plan: Imported from OSS Differential Revision: D20799464 Pulled By: ZolotukhinM fbshipit-source-id: 1b5981ad15042f601a9b6eb01a799cdf71200666	2020-04-01 14:23:33 -07:00
Bram Wasti	a3e10d2a17	Expose enablement of TensorExpr fuser as env variable (#35341 ) Summary: This commit allows one to use an environment variable to enable the fuser in torch/csrc/jit/tensorexpr/ ``` PYTORCH_TENSOREXPR=1 python benchmark.py ``` This commit also changes the registration to happen by default, removing the requirement for the python exposed "_jit_register_tensorexpr_fuser" Pull Request resolved: https://github.com/pytorch/pytorch/pull/35341 Reviewed By: ZolotukhinM Differential Revision: D20676348 Pulled By: bwasti fbshipit-source-id: 4c997cdc310e7567c03905ebff72b3e8a4c2f464	2020-03-26 14:31:57 -07:00
Mikhail Zolotukhin	8998a1b3d3	Add tensorexpr benchmarks. (#35064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35064 Test Plan: Imported from OSS Differential Revision: D20543695 Pulled By: ZolotukhinM fbshipit-source-id: 1cf294ab19465cb93557c2b195252c739b40a0f7	2020-03-20 12:01:31 -07:00
Mikhail Zolotukhin	976d6aaa51	Revert D20251830: [TensorExpr] Add tensorexpr benchmarks. Test Plan: revert-hammer Differential Revision: D20251830 Original commit changeset: bafd66ce32f6 fbshipit-source-id: d8aea4b26441d8aba90c11d7350d3424df494052	2020-03-16 13:20:16 -07:00
Mikhail Zolotukhin	e93e7b2795	[TensorExpr] Add tensorexpr benchmarks. (#34230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34230 This PR adds some benchmarks that we used to assess tensor expressions performance. Differential Revision: D20251830 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: bafd66ce32f63077e3733112d854f5c750d5b1af	2020-03-16 11:49:39 -07:00

43 Commits