21 Commits

Author SHA1 Message Date
dd3a77bc96 Apply UFMT to all files in benchmarks/ (#105928)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928
Approved by: https://github.com/albanD
2023-07-26 01:18:48 +00:00
8ff0b6fef8 [OpBenchMobile] Enable operator_benchmark to run the benchmark on mobile through AiBench (#47767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47767

This diff implements the functionality of running benchmark on mobile on top of operator_benchmark framework. It does so through a few steps:

1. create a scripted module from existing benchmark case.
2. run mobile specific optimization pass on the scripted module
3. run the scripted module on AiBench by calling its Python API

A small change in the way of writing a benchmark case is introduced so that both local and mobile run can share the same interface. The change is about having inputs as arguments of the `forward` function, so that mobile optimization pass can be run successfully (otherwise everything will be optimized away by constant propagation).

Test Plan:
## local op_bench run

buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test --  --iterations 1 --warmup_iterations 1

buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test --  --iterations 1 --warmup_iterations 1 --use_jit

Exceptions: `py_module` op in `FakeQuantizePerTensorBaseOpBenchmark` and `FakeQuantizePerChannelBaseOpBenchmark` under JIT mode. These tests also failed in the base version

```
RuntimeError:
Module 'FakeQuantizePerChannelOpBenchmark' has no attribute 'op_func' (This function exists as an attribute on the Python module, but we failed to compile it to a TorchScript function.
The error stack is reproduced here:

Python builtin <built-in method apply of FunctionMeta object at 0x619000c652a0> is currently not supported in Torchscript:
  File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 260
    quant_min: int, quant_max: int
):
    return _LearnableFakeQuantizePerChannelOp.apply(input, scale, zero_point, axis, quant_min, quant_max, 1.0)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
:
  File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 313
        axis: int, quant_min: int, quant_max: int
    ):
        return self.op_func(input, scale, zero_point, axis, quant_min, quant_max)
               ~~~~~~~~~~~~ <--- HERE
```

`_consume_op` typing mismatch: chunk, split, qobserver, sort in qunary. These will be fixed in D24774105

## OSS test

python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 --use_jit
python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1

## saved module graph
```
module __torch__.mobile_benchmark_utils.OpBenchmarkMobile {
  parameters {
  }
  attributes {
    training = True
    num_iters = 1
    benchmark = <__torch__.pt.add_test.___torch_mangle_4.AddBenchmark object at 0x6070001b8b50>
  }
  methods {
    method forward {
      graph(%self : __torch__.mobile_benchmark_utils.OpBenchmarkMobile):
        %12 : None = prim::Constant() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:9:4
        %4 : bool = prim::Constant[value=1]() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8
        %1 : int = prim::GetAttr[name="num_iters"](%self)
         = prim::Loop(%1, %4) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8
          block0(%i : int):
            %6 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self)
            %7 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self)
            %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]()
            %9 : Tensor, %10 : Tensor = prim::TupleUnpack(%self.inputs_tuple)
            %23 : int = prim::Constant[value=1]()
            %24 : Tensor = aten::add(%9, %10, %23) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15
            -> (%4)
        return (%12)

    }
  }
  submodules {
    module __torch__.pt.add_test.___torch_mangle_4.AddBenchmark {
      parameters {
      }
      attributes {
        mobile_optimized = True
      }
      methods {
        method forward {
          graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark,
                %input_one.1 : Tensor,
                %input_two.1 : Tensor):
            %3 : int = prim::Constant[value=1]()
            %4 : Tensor = aten::add(%input_one.1, %input_two.1, %3) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15
            return (%4)

        }
        method get_inputs {
          graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark):
            %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]()
            return (%self.inputs_tuple)

        }
      }
      submodules {
      }
    }
  }
}

```

Reviewed By: kimishpatel

Differential Revision: D24322214

fbshipit-source-id: 335317eca4f40c4083883eb41dc47caf25cbdfd1
2020-11-12 17:15:05 -08:00
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
f31d6c70fe reduce op bench binary size (#29496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29496

This diff reduces the binary size of op benchmark by avoiding creating all tests at once.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : long

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N2_K1_cpu
# Input: M: 8, N: 2, K: 1, device: cpu
Forward Execution Time (us) : 160.781

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N2_K8_cpu
# Input: M: 8, N: 2, K: 8, device: cpu
Forward Execution Time (us) : 158.941

Reviewed By: hl475

Differential Revision: D18412342

fbshipit-source-id: 5db647019ae8c2e4d6ab361b54b63cf88236b1ae
2019-11-08 22:15:12 -08:00
114e7382b6 skip cuda test if not on GPU machines
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29287

Test Plan:
```
buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_kernel3_out_c256_H16_in_c256_N1_stride1_W16_cpu
# Input: kernel: 3, out_c: 256, H: 16, in_c: 256, N: 1, stride: 1, W: 16, device: cpu
Forward Execution Time (us) : 10434.151

Reviewed By: hl475

Differential Revision: D18344574

fbshipit-source-id: 881c857cf901c4539ee1a61171ab41df1c476db7
2019-11-06 09:37:04 -08:00
9f44a04613 separate PT and C2 to reduce build time (#28731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28731

as title

Test Plan:
```
Before:
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid
Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds.
  Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev')
  ... and 69 more. See logs for all changes
Parsing buck files: finished in 7.2 sec
Creating action graph: finished in 10.0 sec
Building: finished in 06:38.4 min (100%) 29890/29890 jobs, 29890 updated
  Total time: 06:55.7 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: sigmoid

With this diff
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid
Parsing buck files: finished in 6.4 sec
Creating action graph: finished in 9.8 sec
Building: finished in 06:35.9 min (100%) 29892/29892 jobs, 29892 updated
  Total time: 06:52.1 min

Reviewed By: hl475

Differential Revision: D18152071

fbshipit-source-id: 80c29570581bbd2f0e78e2df32734c17a2b036ee
2019-10-28 11:10:47 -07:00
3c986dff77 introduce auto_set to simplify benchmarking the backward path of operators (#23276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23276

This diff introduces a new feature to simplify benchmarking the backward path of ops. Here is an example:

```
...
self.input_one = torch.rand(M, N, K, requires_grad=self.auto_set())
self.input_two = torch.rand(M, N, K, requires_grad=self.auto_set())
...
```

In this way, the benchmark will generate three different test cases.
1. input_one requires grad
2. input_two requires grad
3. both inputs require grad

Here is a sample output:
```
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwdall
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 863.744

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwd1
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 727.915

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwd2
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 687.626
```

Reviewed By: zheng-xq

Differential Revision: D16450355

fbshipit-source-id: 50ae0916e81c3ff9f0c482ed6d386319eb15b305
2019-07-29 15:58:41 -07:00
ffef0e03b7 Enabling GPU device runs for operators (#23461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23461

Enabling GPU device runs for production operator shapes.

Reviewed By: xw285cornell, mingzhe09088

Differential Revision: D16526928

fbshipit-source-id: 46657963f4b0bc43d14205ccf1b63d588657e388
2019-07-26 18:53:40 -07:00
3516f3c235 handle exit from init method (#21211)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21211

There are cases where the `init` method used to create inputs can exit with error. When this happens, that specific input should be skipped.

Reviewed By: zheng-xq

Differential Revision: D15466410

fbshipit-source-id: 55e86764b2ec56f7730349ff1df6e50efc0239d7
2019-07-25 21:41:06 -07:00
2b2fe525b9 introduce a new interface to add a list of operators (#21209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21209

This diff introduces a new interface to add a list of operators. Here are the steps to add ops using this interface:

- create op_list:
```unary_ops_list = op_bench.op_list(
    attr_names=["op_name", "op_function"],
    attrs=[
         ["abs", torch.abs],
         ["abs_", torch.abs_],
   ],
)
```
-  create a bench class:
```
class UnaryOpBenchmark(op_bench.TorchBenchmarkBase):
    def init(self, M, N, op_function):
        self.input_one = torch.rand(M, N)
        self.op_func = op_function

    def forward(self):
        return self.op_func(self.input_one)
```
- 3. register those ops
``` op_bench.generate_pt_tests_from_list(unary_ops_list, unary_ops_configs, UnaryOpBenchmark)
 ```

Reviewed By: zheng-xq

Differential Revision: D15514188

fbshipit-source-id: f09b359cab8175eeb8d51b3ad7bbbcfbc9f6430f
2019-07-09 16:41:29 -07:00
b93f29ded3 add JIT path to the benchmark (#22309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309

This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag.

In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT.  With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op.

Reviewed By: zheng-xq

Differential Revision: D16033082

fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1
2019-07-03 17:18:03 -07:00
9c44f6c723 generate tests based on op metadata (#21432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21432

This diff introduce a new interface to generate tests based on the metadata of operators.

Reviewed By: ajauhri

Differential Revision: D15675542

fbshipit-source-id: ba60e803ea553d8b9eb6cb2bcdc6a0368ef62b1c
2019-07-03 16:48:41 -07:00
4e3c97a0be add separate path for op with JIT (#21210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21210

This diff introduces a new path to run op with JIT. There are two steps involved here:
1. Users need to script the op. This should happen in the `init` method.
2. The generated graph from step1 is passed to `jit_forward` which will be executed by the benchmark backend

Reviewed By: zheng-xq

Differential Revision: D15460831

fbshipit-source-id: 48441d9cd4be5d0acebab901f45544616e6ed2ee
2019-06-10 19:53:58 -07:00
668dbcc41b migrate intraop benchmarks to the new interface (#21202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21202

Migrate Ilia's op benchmarks to the new interface

Reviewed By: hl475

Differential Revision: D15322577

fbshipit-source-id: 8e75d51e7ddacbd56896c55f2996a9358491d83e
2019-05-31 16:19:04 -07:00
31089b02ce introduce a new interface to add op [core changes] (#21147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21147

This diff introduces a new interface to add PT/C2 operators to the benchmark suite.

The following steps are needed to add a new operator:
1. Specify the input shapes, args to an operator in configs
2. Create a PT/C2 benchmark class which includes ```init``` (create tensors),  ```forward``` (specify the operator to be tested.), and ```backward```(gradient of an op.) methods
3. call generate_pt_test/generate_c2_test to create test cases based on configs

Reviewed By: zheng-xq

Differential Revision: D15250380

fbshipit-source-id: 1025a7cf60d2427baa0f3f716455946d3d3e6a27
2019-05-31 09:21:04 -07:00
eecf52b444 Fix in benchmark_test_generator (#20237)
Summary:
Add missing import
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20237

Differential Revision: D15245957

Pulled By: ilia-cher

fbshipit-source-id: 0f71aa08eb9ecac32002a1644838d06ab9faa37c
2019-05-07 17:03:25 -07:00
19e6886576 Intra-op parallel microbenchmarks for PT (#19997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19997
ghimport-source-id: 420d4a68a1ef879beee2734adba8abb575e0b0ab

Differential Revision: D15231375

Pulled By: ilia-cher

fbshipit-source-id: ce7248ea2ebb54d25c9d831c6e3f23f3534557dd
2019-05-06 20:21:45 -07:00
0c7e98b765 Support for non-contiguous tensors and arbitrary dtypes in PT benchmarks (#19993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19993
ghimport-source-id: 4cf51b61bb83b72883148ab0faa0c75c3cef7635

Differential Revision: D15230363

Pulled By: ilia-cher

fbshipit-source-id: a3ab591d6fd24e874958401e63eaec56bda19a5c
2019-05-06 19:12:09 -07:00
26f12af537 Fix op benchmarks error in OSS environment (#19518)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19518

Previous design needs to run the op benchmarks from PyTorch root directory which could lead to `module not found` error in OSS environment. This diff fixes that issue by making the benchmark to be launched in the `benchmarks` folder.

Reviewed By: ilia-cher

Differential Revision: D15020787

fbshipit-source-id: eb09814a33432a66cc857702bc86538cd17bea3b
2019-04-19 16:25:16 -07:00
08f5c05d60 make separate operators as independent binaries (#19450)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19450

We want to make each operator benchmark as a separate binary. The previous way to run the benchmark is by collecting all operators into a single binary, it is unnecessary when we want to filter a specific operator. This diff aims to resolve that issue.

Reviewed By: ilia-cher

Differential Revision: D14808159

fbshipit-source-id: 43cd25b219c6e358d0cd2a61463b34596bf3bfac
2019-04-18 20:00:47 -07:00
45d5b6be48 Enhance front-end to add op (#19433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19433

For operator benchmark project, we need to cover a lot of operators, so the interface for adding operators needs to be very clean and simple. This diff is implementing a new interface to add op.

Here is the logic to add new operator to the benchmark:
```
long_config = {}
short_config = {}

map_func

add_test(
  [long_config, short_config],
  map_func,
  [caffe2 op]
  [pt op]
)
```

Reviewed By: zheng-xq

Differential Revision: D14791191

fbshipit-source-id: ac6738507cf1b9d6013dc8e546a2022a9b177f05
2019-04-18 17:07:02 -07:00