pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-24 15:44:58 +08:00

Author	SHA1	Message	Date
Mingzhe Li	f31d6c70fe	reduce op bench binary size (#29496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29496 This diff reduces the binary size of op benchmark by avoiding creating all tests at once. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : long # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N2_K1_cpu # Input: M: 8, N: 2, K: 1, device: cpu Forward Execution Time (us) : 160.781 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N2_K8_cpu # Input: M: 8, N: 2, K: 8, device: cpu Forward Execution Time (us) : 158.941 Reviewed By: hl475 Differential Revision: D18412342 fbshipit-source-id: 5db647019ae8c2e4d6ab361b54b63cf88236b1ae	2019-11-08 22:15:12 -08:00
Mingzhe Li	0a68e8bab0	fix op bench runtime error when use_jit is enabled (#28837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28837 The JIT code used in op bench is not compatibility with latest JIT code path. This diff aims to resolve that issue. Test Plan: ```buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test -- --use_jit Building: finished in 02:29.8 min (100%) 7055/7055 jobs, 1 updated Total time: 02:30.3 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: JIT # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 118.052 Reviewed By: hl475 Differential Revision: D18197057 fbshipit-source-id: 92edae8a48abc4115a558a91ba46cc9c3edb2eb8	2019-10-29 12:08:28 -07:00
Mingzhe Li	cbcb70f84c	print last 50 runs when using ai_pep_format (#28128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28128 as title Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.169559478759766"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.206514358520508"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.4950008392334"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.172897338867188"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.27255630493164"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.549837112426758"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.63113784790039"} ... Reviewed By: hl475 Differential Revision: D17957611 fbshipit-source-id: 4e70ba2070b97fbbca0d6d4295abbead2ac356d4	2019-10-16 15:22:23 -07:00
Mingzhe Li	382917bbd1	report per iteration execution time (#27923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27923 As title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 --ai_pep_format true # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.027768373489379883"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.02661752700805664"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.026746749877929688"} ... Reviewed By: hl475 Differential Revision: D17911718 fbshipit-source-id: 6fe28f2ab9ce1e0feabb5b822f04ff32dac977a9	2019-10-14 15:44:42 -07:00
Mingzhe Li	c1ed0150c5	canonical example of torch.add benchmark (#23402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23402 This diff tries to make torch.add as a canonical example for op benchmark. Once it lands, we will also modify all other op benchmarks to be uniform with this example. With that, when people are adding new ops, they can copy paste any existing code. Test Plan: buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu # Input: M: 8, N: 16, K: 32, device: cpu Forward Execution Time (us) : 146.586 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecuda # Input: M: 8, N: 16, K: 32, device: cuda Forward Execution Time (us) : 92.151 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecpu # Input: M: 16, N: 16, K: 64, device: cpu Forward Execution Time (us) : 428.421 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecuda # Input: M: 16, N: 16, K: 64, device: cuda Forward Execution Time (us) : 89.811 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_devicecpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 11857.012 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_devicecuda # Input: M: 64, N: 64, K: 128, device: cuda Forward Execution Time (us) : 93.918 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwdall # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 990.125 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwd1 # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 781.217 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwd2 # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 777.307 ``` Reviewed By: zheng-xq Differential Revision: D16501974 fbshipit-source-id: f1eec010eabf11ce4fcf6cfe6f85cd5241a7022d	2019-10-09 11:24:10 -07:00
Mingzhe Li	3c986dff77	introduce auto_set to simplify benchmarking the backward path of operators (#23276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23276 This diff introduces a new feature to simplify benchmarking the backward path of ops. Here is an example: ``` ... self.input_one = torch.rand(M, N, K, requires_grad=self.auto_set()) self.input_two = torch.rand(M, N, K, requires_grad=self.auto_set()) ... ``` In this way, the benchmark will generate three different test cases. 1. input_one requires grad 2. input_two requires grad 3. both inputs require grad Here is a sample output: ``` # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwdall # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 863.744 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwd1 # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 727.915 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N8_K8_bwd2 # Input: M: 1, N: 8, K: 8 Backward Execution Time (us) : 687.626 ``` Reviewed By: zheng-xq Differential Revision: D16450355 fbshipit-source-id: 50ae0916e81c3ff9f0c482ed6d386319eb15b305	2019-07-29 15:58:41 -07:00
Mingzhe Li	b93f29ded3	add JIT path to the benchmark (#22309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309 This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag. In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT. With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op. Reviewed By: zheng-xq Differential Revision: D16033082 fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1	2019-07-03 17:18:03 -07:00
Mingzhe Li	341a7e4bb5	Fix issue in backward path (#21663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21663 as title Reviewed By: hl475 Differential Revision: D15770793 fbshipit-source-id: b3d0dd030237c4d62bddc388984a273153fac4a6	2019-06-11 21:09:25 -07:00
Mingzhe Li	4e3c97a0be	add separate path for op with JIT (#21210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21210 This diff introduces a new path to run op with JIT. There are two steps involved here: 1. Users need to script the op. This should happen in the `init` method. 2. The generated graph from step1 is passed to `jit_forward` which will be executed by the benchmark backend Reviewed By: zheng-xq Differential Revision: D15460831 fbshipit-source-id: 48441d9cd4be5d0acebab901f45544616e6ed2ee	2019-06-10 19:53:58 -07:00
Mingzhe Li	3004b397f0	change test_name to be globally unique value across tests (#21206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21206 This diff change the default test_name to be a globally unique value across tests. With that, users can list all the tests and choose to run a specific test. Reviewed By: zheng-xq Differential Revision: D15543508 fbshipit-source-id: 0814ef6a60d41637fed5245e30c282497cf21bb8	2019-06-03 14:55:11 -07:00
Mingzhe Li	ca80ec7c97	introduce a new intrace to add op [PT changes] (#21149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21149 The diff modifies the interface for PyTorch operators in the benchmark suite Reviewed By: zheng-xq Differential Revision: D15433897 fbshipit-source-id: e858183431eb37d90313356716c2de8709372b58	2019-06-03 14:55:08 -07:00
Ilia Cherniavskii	19e6886576	Intra-op parallel microbenchmarks for PT (#19997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19997 ghimport-source-id: 420d4a68a1ef879beee2734adba8abb575e0b0ab Differential Revision: D15231375 Pulled By: ilia-cher fbshipit-source-id: ce7248ea2ebb54d25c9d831c6e3f23f3534557dd	2019-05-06 20:21:45 -07:00
Ilia Cherniavskii	0c7e98b765	Support for non-contiguous tensors and arbitrary dtypes in PT benchmarks (#19993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19993 ghimport-source-id: 4cf51b61bb83b72883148ab0faa0c75c3cef7635 Differential Revision: D15230363 Pulled By: ilia-cher fbshipit-source-id: a3ab591d6fd24e874958401e63eaec56bda19a5c	2019-05-06 19:12:09 -07:00
Mingzhe Li	26f12af537	Fix op benchmarks error in OSS environment (#19518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19518 Previous design needs to run the op benchmarks from PyTorch root directory which could lead to `module not found` error in OSS environment. This diff fixes that issue by making the benchmark to be launched in the `benchmarks` folder. Reviewed By: ilia-cher Differential Revision: D15020787 fbshipit-source-id: eb09814a33432a66cc857702bc86538cd17bea3b	2019-04-19 16:25:16 -07:00
Mingzhe Li	08f5c05d60	make separate operators as independent binaries (#19450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19450 We want to make each operator benchmark as a separate binary. The previous way to run the benchmark is by collecting all operators into a single binary, it is unnecessary when we want to filter a specific operator. This diff aims to resolve that issue. Reviewed By: ilia-cher Differential Revision: D14808159 fbshipit-source-id: 43cd25b219c6e358d0cd2a61463b34596bf3bfac	2019-04-18 20:00:47 -07:00
Mingzhe Li	5f5a2aaab9	Operator-level performance microbenchmarks (#18740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740 Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure * benchmark_core.py : core utiltiites for running microbenchmark tests * benchmark_caffe2.py : Caffe2 specific benchmark utilitites * benchmark_pytorch.py: PyTorch specific benchmark utilities * benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP. The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend. Include two operator microbenchmarks; support both Caffe2/PyTorch: * MatMul * Add Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python. In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo. Reviewed By: zheng-xq Differential Revision: D13887111 fbshipit-source-id: b7a56b95448c9ec3e674b0de0ffb96af4439bfce	2019-04-02 17:06:19 -07:00

16 Commits