pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	8de85896e0	Enable ruff rule E721 (#165162 ) `E721` checks for object type comparisons using == and other comparison operators. This is useful because it is recommended to use `is` for type comparisons. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165162 Approved by: https://github.com/Skylion007	2025-10-13 01:48:55 +00:00
PyTorch MergeBot	816fb7f48d	Revert "Enable ruff rule E721 (#165162 )" This reverts commit 9e7c19f72b6d0690915c307409c0c0a76b5a3bf0. Reverted https://github.com/pytorch/pytorch/pull/165162 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165162#issuecomment-3393328271))	2025-10-11 13:25:40 +00:00
Yuanyuan Chen	9e7c19f72b	Enable ruff rule E721 (#165162 ) `E721` checks for object type comparisons using == and other comparison operators. This is useful because it is recommended to use `is` for type comparisons. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165162 Approved by: https://github.com/Skylion007	2025-10-11 06:43:53 +00:00
jainapurva	54b38f3b46	Add operator benchmarking run to CI nightly (#162530 ) This PR introduces a new "operator microbenchmark" CI workflow and GitHub Actions for operator microbenchmarks, updating test scripts and job matrices to support new parameters, and broadening the operator benchmark tests to include more data types, larger shapes, and gradient tests. The benchmark configurations now focus more on different cuda hardware and multiple dtypes (bf16, fp16, fp32), for both compile and eager mode. Benchmark Configuration and Coverage: * Expanded operator benchmark configurations in `addmm_test.py`, `bmm_test.py`, `matmul_test.py`, and `mm_test.py` to benchmark multiple dtypes on CUDA devices, in eager and compile mode, for forward and backward run. The configs with tag "long" for the above mentioned files are being run in CI. * The CI benchmarking is running on various hardwares: H100, A100. * The CI job also uploads the microbenchmarking outputs to a [HUD](https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch&benchmarkName=PyTorch+operator+microbenchmark) dashboard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162530 Approved by: https://github.com/huydhn Co-authored-by: Huy Do <huydhn@gmail.com>	2025-09-29 00:46:38 +00:00
jainapurva	af60398c3a	Update the operator benchmarking, to benchmark using torch.compile (#161394 ) This pull request enhances the PyTorch operator benchmarking suite by introducing support for benchmarking with `torch.compile` mode, in addition to existing Eager and JIT. It also adds peak memory measurement (fwd/bwd pass); improves the output format in JSON to be used by dashboard for reporting; and introduce some more CLI options. The new CLI flags introduced are: - Added `--use-compile` CLI argument and corresponding logic to run benchmarks using `torch.compile`, including mutual exclusivity with `--use-jit` - Added `--benchmark-name` argument for customizing the benchmark name in output - Updated default value for `--output-json-for-dashboard` to `benchmark-results.json` for more predictable output file name Sample command to run a single operator: `python -m pt.mm_test --use-compile` Pull Request resolved: https://github.com/pytorch/pytorch/pull/161394 Approved by: https://github.com/jbschlosser	2025-09-09 18:17:37 +00:00
Xu Zhao	4eee2e7a6d	[operator_benchmark] Remove TARGETS from broken benchmarks (#131460 ) Summary: Remove operator_benchmark caffe2 build due to the removal of caffe2: `2fd75667b4` Plus, we are deleting the TARGETS file from broken benchmarks that we do not intend to maintain. Test Plan: Sandcastle CI Differential Revision: D60086216 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131460 Approved by: https://github.com/vmpuri	2024-07-23 20:06:08 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
Edward Z. Yang	dd3a77bc96	Apply UFMT to all files in benchmarks/ (#105928 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928 Approved by: https://github.com/albanD	2023-07-26 01:18:48 +00:00
Justin Chu	5ef023b05a	[BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105429 Approved by: https://github.com/malfet	2023-07-19 04:46:37 +00:00
Xuehai Pan	8d45f555d7	[BE] [1/3] Rewrite `super()` calls in caffe2 and benchmarks (#94587 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587 Approved by: https://github.com/ezyang	2023-02-11 18:19:48 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Rong Rong (AI Infra)	277f587496	rename benchmark_cpp_extension (#58708 ) Summary: Currently the cpp_extension build in benchmarks is misleading as it has the same name with torch.utils.cpp_extension Pull Request resolved: https://github.com/pytorch/pytorch/pull/58708 Test Plan: Run from `./benchmarks/operator_benchmark/pt_extension` folder: ``` python setup.py install python cpp_extension_test.py ``` Note: CI doesn't matter as currently benchmarks/ folder is not compiled/test against CI Reviewed By: robieta Differential Revision: D28585582 Pulled By: walterddr fbshipit-source-id: fc071040cf3cb52ee6c9252b2c5a0c3043393f57	2021-05-24 11:04:02 -07:00
Sam Estep	e3900d2ba5	Add lint for unqualified `noqa` (#56272 ) Summary: As this diff shows, currently there are a couple hundred instances of raw `noqa` in the codebase, which just ignore all errors on a given line. That isn't great, so this PR changes all existing instances of that antipattern to qualify the `noqa` with respect to a specific error code, and adds a lint to prevent more of this from happening in the future. Interestingly, some of the examples the `noqa` lint catches are genuine attempts to qualify the `noqa` with a specific error code, such as these two: ``` test/jit/test_misc.py:27: print(f"{hello + ' ' + test}, I'm a {test}") # noqa E999 test/jit/test_misc.py:28: print(f"format blank") # noqa F541 ``` However, those are still wrong because they are [missing a colon](https://flake8.pycqa.org/en/3.9.1/user/violations.html#in-line-ignoring-errors), which actually causes the error code to be completely ignored: - If you change them to anything else, the warnings will still be suppressed. - If you add the necessary colons then it is revealed that `E261` was also being suppressed, unintentionally: ``` test/jit/test_misc.py:27:57: E261 at least two spaces before inline comment test/jit/test_misc.py:28:35: E261 at least two spaces before inline comment ``` I did try using [flake8-noqa](https://pypi.org/project/flake8-noqa/) instead of a custom `git grep` lint, but it didn't seem to work. This PR is definitely missing some of the functionality that flake8-noqa is supposed to provide, though, so if someone can figure out how to use it, we should do that instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56272 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI run (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2365189927 Reviewed By: janeyx99 Differential Revision: D27830127 Pulled By: samestep fbshipit-source-id: d6dcf4f945ebd18cd76c46a07f3b408296864fcb	2021-04-19 13:16:18 -07:00
Yang Wang	8ff0b6fef8	[OpBenchMobile] Enable operator_benchmark to run the benchmark on mobile through AiBench (#47767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47767 This diff implements the functionality of running benchmark on mobile on top of operator_benchmark framework. It does so through a few steps: 1. create a scripted module from existing benchmark case. 2. run mobile specific optimization pass on the scripted module 3. run the scripted module on AiBench by calling its Python API A small change in the way of writing a benchmark case is introduced so that both local and mobile run can share the same interface. The change is about having inputs as arguments of the `forward` function, so that mobile optimization pass can be run successfully (otherwise everything will be optimized away by constant propagation). Test Plan: ## local op_bench run buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --warmup_iterations 1 buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --warmup_iterations 1 --use_jit Exceptions: `py_module` op in `FakeQuantizePerTensorBaseOpBenchmark` and `FakeQuantizePerChannelBaseOpBenchmark` under JIT mode. These tests also failed in the base version ``` RuntimeError: Module 'FakeQuantizePerChannelOpBenchmark' has no attribute 'op_func' (This function exists as an attribute on the Python module, but we failed to compile it to a TorchScript function. The error stack is reproduced here: Python builtin <built-in method apply of FunctionMeta object at 0x619000c652a0> is currently not supported in Torchscript: File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 260 quant_min: int, quant_max: int ): return _LearnableFakeQuantizePerChannelOp.apply(input, scale, zero_point, axis, quant_min, quant_max, 1.0) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE : File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 313 axis: int, quant_min: int, quant_max: int ): return self.op_func(input, scale, zero_point, axis, quant_min, quant_max) ~~~~~~~~~~~~ <--- HERE ``` `_consume_op` typing mismatch: chunk, split, qobserver, sort in qunary. These will be fixed in D24774105 ## OSS test python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 --use_jit python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 ## saved module graph ``` module __torch__.mobile_benchmark_utils.OpBenchmarkMobile { parameters { } attributes { training = True num_iters = 1 benchmark = <__torch__.pt.add_test.___torch_mangle_4.AddBenchmark object at 0x6070001b8b50> } methods { method forward { graph(%self : __torch__.mobile_benchmark_utils.OpBenchmarkMobile): %12 : None = prim::Constant() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:9:4 %4 : bool = prim::Constant[value=1]() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8 %1 : int = prim::GetAttr[name="num_iters"](%self) = prim::Loop(%1, %4) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8 block0(%i : int): %6 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self) %7 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self) %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]() %9 : Tensor, %10 : Tensor = prim::TupleUnpack(%self.inputs_tuple) %23 : int = prim::Constant[value=1]() %24 : Tensor = aten::add(%9, %10, %23) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15 -> (%4) return (%12) } } submodules { module __torch__.pt.add_test.___torch_mangle_4.AddBenchmark { parameters { } attributes { mobile_optimized = True } methods { method forward { graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark, %input_one.1 : Tensor, %input_two.1 : Tensor): %3 : int = prim::Constant[value=1]() %4 : Tensor = aten::add(%input_one.1, %input_two.1, %3) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15 return (%4) } method get_inputs { graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark): %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]() return (%self.inputs_tuple) } } submodules { } } } } ``` Reviewed By: kimishpatel Differential Revision: D24322214 fbshipit-source-id: 335317eca4f40c4083883eb41dc47caf25cbdfd1	2020-11-12 17:15:05 -08:00
Mingzhe Li	e829d4fba9	[op-bench] fix jit mode (#45774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45774 Fix RuntimeError: No such operator operator_benchmark::_consume Test Plan: waitforsandcastle Reviewed By: ngimel Differential Revision: D24064982 fbshipit-source-id: 13160b6d18569e659ca1ab0ca1d444ed9947260c	2020-10-05 09:29:41 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Nikita Shulga	c02e7c464a	Replace import cpp_benchmark with `torch.utils.cpp_benchmark` (#38832 ) Summary: Otherwise, I don't understand how those could have been invoked Also, what is the benefit of importing the same module twice? Pull Request resolved: https://github.com/pytorch/pytorch/pull/38832 Differential Revision: D21675081 Pulled By: malfet fbshipit-source-id: fee5604c4c433161b6b1a999d505b5acbbc3b421	2020-05-20 18:53:09 -07:00
Mingzhe Li	c543034531	add cuda sync when ops running on gpu (#29936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29936 This diff adds synchronization after op execution to ensure all the cuda streams complete. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 154.412 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cuda # Input: M: 64, N: 64, K: 64, device: cuda Forward Execution Time (us) : 101.115 ... Reviewed By: hl475 Differential Revision: D18542732 fbshipit-source-id: b979d26a174f488e971074dc1e16b00e17179c80	2019-11-15 18:02:48 -08:00
Mingzhe Li	f31d6c70fe	reduce op bench binary size (#29496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29496 This diff reduces the binary size of op benchmark by avoiding creating all tests at once. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : long # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N2_K1_cpu # Input: M: 8, N: 2, K: 1, device: cpu Forward Execution Time (us) : 160.781 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N2_K8_cpu # Input: M: 8, N: 2, K: 8, device: cpu Forward Execution Time (us) : 158.941 Reviewed By: hl475 Differential Revision: D18412342 fbshipit-source-id: 5db647019ae8c2e4d6ab361b54b63cf88236b1ae	2019-11-08 22:15:12 -08:00
Mingzhe Li	0a68e8bab0	fix op bench runtime error when use_jit is enabled (#28837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28837 The JIT code used in op bench is not compatibility with latest JIT code path. This diff aims to resolve that issue. Test Plan: ```buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test -- --use_jit Building: finished in 02:29.8 min (100%) 7055/7055 jobs, 1 updated Total time: 02:30.3 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: JIT # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 118.052 Reviewed By: hl475 Differential Revision: D18197057 fbshipit-source-id: 92edae8a48abc4115a558a91ba46cc9c3edb2eb8	2019-10-29 12:08:28 -07:00
Mingzhe Li	cbcb70f84c	print last 50 runs when using ai_pep_format (#28128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28128 as title Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.169559478759766"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.206514358520508"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.4950008392334"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.172897338867188"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.27255630493164"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.549837112426758"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.63113784790039"} ... Reviewed By: hl475 Differential Revision: D17957611 fbshipit-source-id: 4e70ba2070b97fbbca0d6d4295abbead2ac356d4	2019-10-16 15:22:23 -07:00
Mingzhe Li	382917bbd1	report per iteration execution time (#27923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27923 As title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 --ai_pep_format true # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.027768373489379883"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.02661752700805664"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.026746749877929688"} ... Reviewed By: hl475 Differential Revision: D17911718 fbshipit-source-id: 6fe28f2ab9ce1e0feabb5b822f04ff32dac977a9	2019-10-14 15:44:42 -07:00
Mingzhe Li	c1ed0150c5	canonical example of torch.add benchmark (#23402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23402 This diff tries to make torch.add as a canonical example for op benchmark. Once it lands, we will also modify all other op benchmarks to be uniform with this example. With that, when people are adding new ops, they can copy paste any existing code. Test Plan: buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu # Input: M: 8, N: 16, K: 32, device: cpu Forward Execution Time (us) : 146.586 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecuda # Input: M: 8, N: 16, K: 32, device: cuda Forward Execution Time (us) : 92.151 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecpu # Input: M: 16, N: 16, K: 64, device: cpu Forward Execution Time (us) : 428.421 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecuda # Input: M: 16, N: 16, K: 64, device: cuda Forward Execution Time (us) : 89.811 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_devicecpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 11857.012 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_devicecuda # Input: M: 64, N: 64, K: 128, device: cuda Forward Execution Time (us) : 93.918 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwdall # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 990.125 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwd1 # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 781.217 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwd2 # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 777.307 ``` Reviewed By: zheng-xq Differential Revision: D16501974 fbshipit-source-id: f1eec010eabf11ce4fcf6cfe6f85cd5241a7022d	2019-10-09 11:24:10 -07:00
Mingzhe Li	3c986dff77	introduce auto_set to simplify benchmarking the backward path of operators (#23276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23276 This diff introduces a new feature to simplify benchmarking the backward path of ops. Here is an example: ``` ... self.input_one = torch.rand(M, N, K, requires_grad=self.auto_set()) self.input_two = torch.rand(M, N, K, requires_grad=self.auto_set()) ... ``` In this way, the benchmark will generate three different test cases. 1. input_one requires grad 2. input_two requires grad 3. both inputs require grad Here is a sample output: ``` # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwdall # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 863.744 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwd1 # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 727.915 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwd2 # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 687.626 ``` Reviewed By: zheng-xq Differential Revision: D16450355 fbshipit-source-id: 50ae0916e81c3ff9f0c482ed6d386319eb15b305	2019-07-29 15:58:41 -07:00
Mingzhe Li	b93f29ded3	add JIT path to the benchmark (#22309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309 This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag. In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT. With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op. Reviewed By: zheng-xq Differential Revision: D16033082 fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1	2019-07-03 17:18:03 -07:00
Mingzhe Li	341a7e4bb5	Fix issue in backward path (#21663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21663 as title Reviewed By: hl475 Differential Revision: D15770793 fbshipit-source-id: b3d0dd030237c4d62bddc388984a273153fac4a6	2019-06-11 21:09:25 -07:00
Mingzhe Li	4e3c97a0be	add separate path for op with JIT (#21210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21210 This diff introduces a new path to run op with JIT. There are two steps involved here: 1. Users need to script the op. This should happen in the `init` method. 2. The generated graph from step1 is passed to `jit_forward` which will be executed by the benchmark backend Reviewed By: zheng-xq Differential Revision: D15460831 fbshipit-source-id: 48441d9cd4be5d0acebab901f45544616e6ed2ee	2019-06-10 19:53:58 -07:00
Mingzhe Li	3004b397f0	change test_name to be globally unique value across tests (#21206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21206 This diff change the default test_name to be a globally unique value across tests. With that, users can list all the tests and choose to run a specific test. Reviewed By: zheng-xq Differential Revision: D15543508 fbshipit-source-id: 0814ef6a60d41637fed5245e30c282497cf21bb8	2019-06-03 14:55:11 -07:00
Mingzhe Li	ca80ec7c97	introduce a new intrace to add op [PT changes] (#21149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21149 The diff modifies the interface for PyTorch operators in the benchmark suite Reviewed By: zheng-xq Differential Revision: D15433897 fbshipit-source-id: e858183431eb37d90313356716c2de8709372b58	2019-06-03 14:55:08 -07:00
Ilia Cherniavskii	19e6886576	Intra-op parallel microbenchmarks for PT (#19997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19997 ghimport-source-id: 420d4a68a1ef879beee2734adba8abb575e0b0ab Differential Revision: D15231375 Pulled By: ilia-cher fbshipit-source-id: ce7248ea2ebb54d25c9d831c6e3f23f3534557dd	2019-05-06 20:21:45 -07:00
Ilia Cherniavskii	0c7e98b765	Support for non-contiguous tensors and arbitrary dtypes in PT benchmarks (#19993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19993 ghimport-source-id: 4cf51b61bb83b72883148ab0faa0c75c3cef7635 Differential Revision: D15230363 Pulled By: ilia-cher fbshipit-source-id: a3ab591d6fd24e874958401e63eaec56bda19a5c	2019-05-06 19:12:09 -07:00
Mingzhe Li	26f12af537	Fix op benchmarks error in OSS environment (#19518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19518 Previous design needs to run the op benchmarks from PyTorch root directory which could lead to `module not found` error in OSS environment. This diff fixes that issue by making the benchmark to be launched in the `benchmarks` folder. Reviewed By: ilia-cher Differential Revision: D15020787 fbshipit-source-id: eb09814a33432a66cc857702bc86538cd17bea3b	2019-04-19 16:25:16 -07:00
Mingzhe Li	08f5c05d60	make separate operators as independent binaries (#19450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19450 We want to make each operator benchmark as a separate binary. The previous way to run the benchmark is by collecting all operators into a single binary, it is unnecessary when we want to filter a specific operator. This diff aims to resolve that issue. Reviewed By: ilia-cher Differential Revision: D14808159 fbshipit-source-id: 43cd25b219c6e358d0cd2a61463b34596bf3bfac	2019-04-18 20:00:47 -07:00
Mingzhe Li	5f5a2aaab9	Operator-level performance microbenchmarks (#18740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740 Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure * benchmark_core.py : core utiltiites for running microbenchmark tests * benchmark_caffe2.py : Caffe2 specific benchmark utilitites * benchmark_pytorch.py: PyTorch specific benchmark utilities * benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP. The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend. Include two operator microbenchmarks; support both Caffe2/PyTorch: * MatMul * Add Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python. In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo. Reviewed By: zheng-xq Differential Revision: D13887111 fbshipit-source-id: b7a56b95448c9ec3e674b0de0ffb96af4439bfce	2019-04-02 17:06:19 -07:00

36 Commits