pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Xuehai Pan	c0ed38e644	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754 Approved by: https://github.com/ezyang	2024-07-17 14:34:42 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
Edward Z. Yang	dd3a77bc96	Apply UFMT to all files in benchmarks/ (#105928 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928 Approved by: https://github.com/albanD	2023-07-26 01:18:48 +00:00
Yang Wang	8ff0b6fef8	[OpBenchMobile] Enable operator_benchmark to run the benchmark on mobile through AiBench (#47767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47767 This diff implements the functionality of running benchmark on mobile on top of operator_benchmark framework. It does so through a few steps: 1. create a scripted module from existing benchmark case. 2. run mobile specific optimization pass on the scripted module 3. run the scripted module on AiBench by calling its Python API A small change in the way of writing a benchmark case is introduced so that both local and mobile run can share the same interface. The change is about having inputs as arguments of the `forward` function, so that mobile optimization pass can be run successfully (otherwise everything will be optimized away by constant propagation). Test Plan: ## local op_bench run buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --warmup_iterations 1 buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --warmup_iterations 1 --use_jit Exceptions: `py_module` op in `FakeQuantizePerTensorBaseOpBenchmark` and `FakeQuantizePerChannelBaseOpBenchmark` under JIT mode. These tests also failed in the base version ``` RuntimeError: Module 'FakeQuantizePerChannelOpBenchmark' has no attribute 'op_func' (This function exists as an attribute on the Python module, but we failed to compile it to a TorchScript function. The error stack is reproduced here: Python builtin <built-in method apply of FunctionMeta object at 0x619000c652a0> is currently not supported in Torchscript: File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 260 quant_min: int, quant_max: int ): return _LearnableFakeQuantizePerChannelOp.apply(input, scale, zero_point, axis, quant_min, quant_max, 1.0) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE : File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 313 axis: int, quant_min: int, quant_max: int ): return self.op_func(input, scale, zero_point, axis, quant_min, quant_max) ~~~~~~~~~~~~ <--- HERE ``` `_consume_op` typing mismatch: chunk, split, qobserver, sort in qunary. These will be fixed in D24774105 ## OSS test python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 --use_jit python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 ## saved module graph ``` module __torch__.mobile_benchmark_utils.OpBenchmarkMobile { parameters { } attributes { training = True num_iters = 1 benchmark = <__torch__.pt.add_test.___torch_mangle_4.AddBenchmark object at 0x6070001b8b50> } methods { method forward { graph(%self : __torch__.mobile_benchmark_utils.OpBenchmarkMobile): %12 : None = prim::Constant() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:9:4 %4 : bool = prim::Constant[value=1]() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8 %1 : int = prim::GetAttr[name="num_iters"](%self) = prim::Loop(%1, %4) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8 block0(%i : int): %6 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self) %7 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self) %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]() %9 : Tensor, %10 : Tensor = prim::TupleUnpack(%self.inputs_tuple) %23 : int = prim::Constant[value=1]() %24 : Tensor = aten::add(%9, %10, %23) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15 -> (%4) return (%12) } } submodules { module __torch__.pt.add_test.___torch_mangle_4.AddBenchmark { parameters { } attributes { mobile_optimized = True } methods { method forward { graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark, %input_one.1 : Tensor, %input_two.1 : Tensor): %3 : int = prim::Constant[value=1]() %4 : Tensor = aten::add(%input_one.1, %input_two.1, %3) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15 return (%4) } method get_inputs { graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark): %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]() return (%self.inputs_tuple) } } submodules { } } } } ``` Reviewed By: kimishpatel Differential Revision: D24322214 fbshipit-source-id: 335317eca4f40c4083883eb41dc47caf25cbdfd1	2020-11-12 17:15:05 -08:00
Sheng Qin	c9222b7471	Implement clip_ranges operator for PyTorch Test Plan: unit test for correctness ``` buck test caffe2/torch/fb/sparsenn:test -- test_clip_ranges Parsing buck files: finished in 1.6 sec Creating action graph: finished in 18.9 sec Building: finished in 15.0 sec (100%) 9442/9442 jobs, 1 updated Total time: 35.6 sec More details at https://www.internalfb.com/intern/buck/build/66fb17de-859e-4d01-89bf-5c5de2950693 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 80f5e0c2-7db2-48a4-b148-25dd34651682 Trace available for this run at /tmp/tpx-20201026-123217.050766/trace.log Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599665041422 ✓ ListingSuccess: caffe2/torch/fb/sparsenn:test - main (14.912) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (14.098) Summary Pass: 1 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599665041422 ``` new benchmark perf test ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu # Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu Forward Execution Time (us) : 155.765 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu # Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu Forward Execution Time (us) : 156.248 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu # Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu Forward Execution Time (us) : 156.634 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu # Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu Forward Execution Time (us) : 155.408 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu # Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu Forward Execution Time (us) : 165.168 ``` Compare with the old implementation, there are around 300us gain ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu # Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu Forward Execution Time (us) : 443.012 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu # Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu Forward Execution Time (us) : 446.480 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu # Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu Forward Execution Time (us) : 444.064 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu # Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu Forward Execution Time (us) : 445.511 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu # Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu Forward Execution Time (us) : 450.468 ``` Reviewed By: MarcioPorto Differential Revision: D24546110 fbshipit-source-id: e6c9b38e911f177f97961ede5bf375107f240363	2020-10-28 09:46:37 -07:00
Sheng Qin	c6858fd71a	Set up benchmarks for ClipRanges operator for Caffe2 and PyTorch Summary: As title, adding the benchmark tests for ClipRanges operators. Test Plan: benchmark test for Caffe2 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking Caffe2: clip_ranges WARNING: Logging before InitGoogleLogging() is written to STDERR W1026 12:30:33.938997 2658759 init.h:137] Caffe2 GlobalInit should be run before any other API calls. # Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypeint32 # Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: int32 Forward Execution Time (us) : 5.805 # Benchmarking Caffe2: clip_ranges # Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypeint32 # Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: int32 Forward Execution Time (us) : 5.913 # Benchmarking Caffe2: clip_ranges # Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypeint32 # Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: int32 Forward Execution Time (us) : 5.941 # Benchmarking Caffe2: clip_ranges # Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypeint32 # Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: int32 Forward Execution Time (us) : 5.868 # Benchmarking Caffe2: clip_ranges # Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypeint32 # Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: int32 Forward Execution Time (us) : 6.408 ``` benchmark test for PyTorch ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu # Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu Forward Execution Time (us) : 443.012 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu # Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu Forward Execution Time (us) : 446.480 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu # Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu Forward Execution Time (us) : 444.064 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu # Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu Forward Execution Time (us) : 445.511 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu # Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu Forward Execution Time (us) : 450.468 ``` Reviewed By: MarcioPorto Differential Revision: D24500468 fbshipit-source-id: a582090a3982005af272cb10cdd257b2b2e787c4	2020-10-28 09:42:10 -07:00

8 Commits