36 Commits

Author SHA1 Message Date
af60398c3a Update the operator benchmarking, to benchmark using torch.compile (#161394)
This pull request enhances the PyTorch operator benchmarking suite by introducing support for benchmarking with `torch.compile` mode, in addition to existing Eager and JIT. It also adds peak memory measurement (fwd/bwd pass); improves the output format in JSON to be used by dashboard for reporting; and introduce some more CLI options. The new CLI flags introduced are:

- Added `--use-compile` CLI argument and corresponding logic to run benchmarks using `torch.compile`, including mutual exclusivity with `--use-jit`
- Added `--benchmark-name` argument for customizing the benchmark name in output
- Updated default value for `--output-json-for-dashboard` to `benchmark-results.json` for more predictable output file name

Sample command to run a single operator:
`python -m pt.mm_test --use-compile`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161394
Approved by: https://github.com/jbschlosser
2025-09-09 18:17:37 +00:00
69a57d9486 add JSON output support for operator benchmark (#154410)
To better support the integration of operator benchmark performance data into the OSS benchmark database for the dashboard, I’ve added a JSON output format that meets the required specifications: https://github.com/pytorch/pytorch/wiki/How-to-integrate-with-PyTorch-OSS-benchmark-database#output-format
Since the current operator benchmark already has a flag `--output-json` to support saving the results into a JSON file, I add a new flag `--output-json-for-dashboard` for this feature.
At the same time, I renamed the `--output-dir` to `--output-csv` for a clearer and more intuitive expression.
An example of the JSON output of the operator benchmark.
```
[
  {
    "benchmark": {
      "name": "PyTorch operator benchmark - add_M1_N1_K1_cpu",
      "mode": "inference",
      "dtype": "float32",
      "extra_info": {
        "input_config": "M: 1, N: 1, K: 1, device: cpu"
      }
    },
    "model": {
      "name": "add_M1_N1_K1_cpu",
      "type": "micro-benchmark",
      "origins": [
        "pytorch"
      ]
    },
    "metric": {
      "name": "latency",
      "unit": "us",
      "benchmark_values": [
        2.074
      ],
      "target_value": null
    }
  },
  {
    "benchmark": {
      "name": "PyTorch operator benchmark - add_M64_N64_K64_cpu",
      "mode": "inference",
      "dtype": "float32",
      "extra_info": {
        "input_config": "M: 64, N: 64, K: 64, device: cpu"
      }
    },
    "model": {
      "name": "add_M64_N64_K64_cpu",
      "type": "micro-benchmark",
      "origins": [
        "pytorch"
      ]
    },
    "metric": {
      "name": "latency",
      "unit": "us",
      "benchmark_values": [
        9.973
      ],
      "target_value": null
    }
  },
]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154410
Approved by: https://github.com/huydhn
2025-06-03 21:29:24 +00:00
d2f6c6df1d unbreak fb:operator_benchmark_test (#152049)
Summary: unbreak fb:operator_benchmark_test

Test Plan: works on my machine

Differential Revision: D73540912

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152049
Approved by: https://github.com/hl475
2025-05-15 03:38:48 +00:00
fa5f556f88 [CI] enable operator benchmark on CPU (#143733)
This is to enable operator benchmark for CPU to track op level performance. This PR is motivated by PR: https://github.com/pytorch/pytorch/issues/120982 and investigate feasibility in https://github.com/pytorch/pytorch/pull/127216

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143733
Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman, https://github.com/huydhn, https://github.com/malfet

Co-authored-by: diwei sun <diwei.sun@intel.com>
Co-authored-by: chuanqiw <chuanqi.wang@intel.com>
2025-03-21 16:46:03 +00:00
86c3370bc3 operator benchmark: write output to a JSON (#142809)
This pull request adds the functionality of writing the output of operator benchmark to an optional JSON file specified. The output is still printed in the terminal like before, but the user has the option of saving it in a JSON file as well.

Main part of the functionality is implemented using the function _perf_result_to_dict which outputs a dictionary to be put inside a JSON file. Each dictionary corresponds to a single test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142809
Approved by: https://github.com/albanD
2024-12-14 01:42:00 +00:00
c0ed38e644 [BE][Easy][3/19] enforce style for empty lines in import segments in benchmarks/ (#129754)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754
Approved by: https://github.com/ezyang
2024-07-17 14:34:42 +00:00
cyy
2fd75667b4 [Caffe2]Remove Caffe2 scripts and benchmarks (#126747)
Due to removal of Caffe2.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126747
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-06-05 23:46:31 +00:00
dd3a77bc96 Apply UFMT to all files in benchmarks/ (#105928)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928
Approved by: https://github.com/albanD
2023-07-26 01:18:48 +00:00
a229b4526f [BE] Prefer dash over underscore in command-line options (#94505)
Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility.

Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library:

`argparse.BooleanOptionalAction`: 4a9dff0e5a/Lib/argparse.py (L893-L895)

```python
class BooleanOptionalAction(Action):
    def __init__(...):
            if option_string.startswith('--'):
                option_string = '--no-' + option_string[2:]
                _option_strings.append(option_string)
```

It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-09 20:16:49 +00:00
2e8b9c7785 [TorchArrow][AIBench] Add AIBench Metrics for TorchArrow Inference Benchmark Test (#75035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75035

- modify `--ai_pep_format` to `--report_aibench` to better reflect underlying framework name change

Reviewed By: tgalkovskyi

Differential Revision: D35257017

fbshipit-source-id: 6c0a2e4585db928b029484d4b81165bfc99bff9f
(cherry picked from commit 18f4962539ccb09a3c33b146206342ea3930f275)
2022-04-01 00:35:42 +00:00
8ff0b6fef8 [OpBenchMobile] Enable operator_benchmark to run the benchmark on mobile through AiBench (#47767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47767

This diff implements the functionality of running benchmark on mobile on top of operator_benchmark framework. It does so through a few steps:

1. create a scripted module from existing benchmark case.
2. run mobile specific optimization pass on the scripted module
3. run the scripted module on AiBench by calling its Python API

A small change in the way of writing a benchmark case is introduced so that both local and mobile run can share the same interface. The change is about having inputs as arguments of the `forward` function, so that mobile optimization pass can be run successfully (otherwise everything will be optimized away by constant propagation).

Test Plan:
## local op_bench run

buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test --  --iterations 1 --warmup_iterations 1

buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test --  --iterations 1 --warmup_iterations 1 --use_jit

Exceptions: `py_module` op in `FakeQuantizePerTensorBaseOpBenchmark` and `FakeQuantizePerChannelBaseOpBenchmark` under JIT mode. These tests also failed in the base version

```
RuntimeError:
Module 'FakeQuantizePerChannelOpBenchmark' has no attribute 'op_func' (This function exists as an attribute on the Python module, but we failed to compile it to a TorchScript function.
The error stack is reproduced here:

Python builtin <built-in method apply of FunctionMeta object at 0x619000c652a0> is currently not supported in Torchscript:
  File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 260
    quant_min: int, quant_max: int
):
    return _LearnableFakeQuantizePerChannelOp.apply(input, scale, zero_point, axis, quant_min, quant_max, 1.0)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
:
  File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 313
        axis: int, quant_min: int, quant_max: int
    ):
        return self.op_func(input, scale, zero_point, axis, quant_min, quant_max)
               ~~~~~~~~~~~~ <--- HERE
```

`_consume_op` typing mismatch: chunk, split, qobserver, sort in qunary. These will be fixed in D24774105

## OSS test

python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 --use_jit
python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1

## saved module graph
```
module __torch__.mobile_benchmark_utils.OpBenchmarkMobile {
  parameters {
  }
  attributes {
    training = True
    num_iters = 1
    benchmark = <__torch__.pt.add_test.___torch_mangle_4.AddBenchmark object at 0x6070001b8b50>
  }
  methods {
    method forward {
      graph(%self : __torch__.mobile_benchmark_utils.OpBenchmarkMobile):
        %12 : None = prim::Constant() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:9:4
        %4 : bool = prim::Constant[value=1]() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8
        %1 : int = prim::GetAttr[name="num_iters"](%self)
         = prim::Loop(%1, %4) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8
          block0(%i : int):
            %6 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self)
            %7 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self)
            %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]()
            %9 : Tensor, %10 : Tensor = prim::TupleUnpack(%self.inputs_tuple)
            %23 : int = prim::Constant[value=1]()
            %24 : Tensor = aten::add(%9, %10, %23) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15
            -> (%4)
        return (%12)

    }
  }
  submodules {
    module __torch__.pt.add_test.___torch_mangle_4.AddBenchmark {
      parameters {
      }
      attributes {
        mobile_optimized = True
      }
      methods {
        method forward {
          graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark,
                %input_one.1 : Tensor,
                %input_two.1 : Tensor):
            %3 : int = prim::Constant[value=1]()
            %4 : Tensor = aten::add(%input_one.1, %input_two.1, %3) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15
            return (%4)

        }
        method get_inputs {
          graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark):
            %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]()
            return (%self.inputs_tuple)

        }
      }
      submodules {
      }
    }
  }
}

```

Reviewed By: kimishpatel

Differential Revision: D24322214

fbshipit-source-id: 335317eca4f40c4083883eb41dc47caf25cbdfd1
2020-11-12 17:15:05 -08:00
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
f9010d7648 remove wipe cache from op bench (#31334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31334

The wipe cache logic was introduced hoping to reduce the variations in the benchmark results. Based on our experiments result, it didn't actually help with that. In addition, several engineers had encountered the issue of missing cpuinfo.h which was used in the wipe cache logic. So this diff removes that feature to ensure smooth installation and running of the op bench.

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N1_K1_cpu
# Input: M: 1, N: 1, K: 1, device: cpu
Forward Execution Time (us) : 111.192

A/B test also pass Benchmark Run #2476535015

Reviewed By: hl475

Differential Revision: D19126970

fbshipit-source-id: 9b1ab48c121838836ba6e0ae664a48fe2d18efdd
2019-12-16 16:34:14 -08:00
9cb8fb61c2 update operator_range discription in op bench (#30170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30170

as title

Test Plan:
```
buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range ef
...
ValueError: The correct format for operator_range is <start>-<end>, or <point>, <start>-<end>

buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range a-b
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cpu
# Input: M: 8, N: 32, K: 256, device: cpu
Forward Execution Time (us) : 60.551

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cuda
# Input: M: 8, N: 32, K: 256, device: cuda
Forward Execution Time (us) : 67.716
...

buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range b,d-f
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: batchnorm
# Mode: Eager
# Name: batchnorm_M1_N256_K3136_cpu
# Input: M: 1, N: 256, K: 3136, device: cpu
Forward Execution Time (us) : 296.004
...

Reviewed By: hl475

Differential Revision: D18619975

fbshipit-source-id: 08f27ee2aeda47be431385f4b20ef7fbeb797516
2019-11-20 12:07:14 -08:00
8b9bac1fad add operator-range argument to the op bench (#30051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30051

This argument takes hyphen delimited start and end chars to filter operators. If the first character of an operator is in the start and end range, it will be tested. Otherwise skipped.

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test  -- --iterations 1 --operator_range b-c
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: ceil
# Mode: Eager
# Name: ceil_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 110.720

# Benchmarking PyTorch: ceil_
# Mode: Eager
# Name: ceil__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 51.128
...

buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test  -- --iterations 1 --operator_range None
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 107.113

# Benchmarking PyTorch: abs_
# Mode: Eager
# Name: abs__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 54.259
...

Reviewed By: hl475

Differential Revision: D18581910

fbshipit-source-id: b1a1a7ba76f4d6a61c8a1659f15e9c66097654d4
2019-11-18 20:34:43 -08:00
90e3bbf3ab support all with tag_filter to run all shapes (#29864)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29864

This diff make `all` as a reserved keyword for tag_filter. When `all` is passed from user, it will run all the supported shapes.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 --tag_filter all
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cpu
# Input: M: 8, N: 32, K: 256, device: cpu
Forward Execution Time (us) : 6798.688

...

Reviewed By: hl475

Differential Revision: D18520249

fbshipit-source-id: 4d55af9f46f89b2fe8842e1a00dfa8e5acaf4fa2
2019-11-14 20:19:05 -08:00
6572d0d174 add a new flag to select machine for op benchmark (#29349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29349

This diff adds a new flag to pick cpu/gpu machines to run op benchmarks. The default is None which will try to run all support devices.

Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 124.283
...
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K128_cuda_bwdall
# Input: M: 64, N: 64, K: 128, device: cuda
Backward Execution Time (us) : 176.592

buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cpu
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 121.884

buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cuda
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 26.002

Reviewed By: hl475

Differential Revision: D18363942

fbshipit-source-id: fccd1fd09bcd6d7725e6fa4063559a27d9cc3065
2019-11-06 20:13:25 -08:00
f63cbf3ae2 change op benchmark forward_only flag (#28967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28967

Change forward_only flag to take True or False so it should be integrated with PEP.

Test Plan:
```
[mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only True  --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 152.489

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 236.608

[mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only False   --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 147.174

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 253.437

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 1044.082

Reviewed By: hl475

Differential Revision: D18247416

fbshipit-source-id: 1c6cff1ac98233d4f0ca298e0cb4a0d3466e5834
2019-10-31 13:28:58 -07:00
9f44a04613 separate PT and C2 to reduce build time (#28731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28731

as title

Test Plan:
```
Before:
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid
Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds.
  Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev')
  ... and 69 more. See logs for all changes
Parsing buck files: finished in 7.2 sec
Creating action graph: finished in 10.0 sec
Building: finished in 06:38.4 min (100%) 29890/29890 jobs, 29890 updated
  Total time: 06:55.7 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: sigmoid

With this diff
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid
Parsing buck files: finished in 6.4 sec
Creating action graph: finished in 9.8 sec
Building: finished in 06:35.9 min (100%) 29892/29892 jobs, 29892 updated
  Total time: 06:52.1 min

Reviewed By: hl475

Differential Revision: D18152071

fbshipit-source-id: 80c29570581bbd2f0e78e2df32734c17a2b036ee
2019-10-28 11:10:47 -07:00
bf7ebc5a53 Set number of threads for operator_benchmarks (#27010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27010

Setting OMP_NUM_THREADS programmatically doesn't do the right thing because initialization is already done. Fixing this by calling torch.set_num_threads explicitly.

Passing --omp_num_threads works as expected now.

In dir benchmarks/operator_benchmark/

python -m pt.qconv_test --tag_filter resnext101_32x4 --wipe_cache --test_name QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 --omp_num_threads 1
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 509.965

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 576.007
```

python -m pt.qconv_test --tag_filter resnext101_32x4 --wipe_cache --test_name QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0 --omp_num_threads 4

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 195.002

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 189.788
```
ghstack-source-id: 91050434

Test Plan: See summary

Differential Revision: D17647391

fbshipit-source-id: e00de1151902291ed94fd34446995ea1f0199d14
2019-09-30 17:04:51 -07:00
1b38a6f602 add wipe cache
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24390

Reviewed By: mingzhe09088

Differential Revision: D16808041

fbshipit-source-id: 1b19f47706e4e2f2e03356469315b55c6ff76d20
2019-08-14 23:48:52 -07:00
f511abb701 increase default warmup iter and iter (#24272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24272

As title, plus some lint

Reviewed By: mingzhe09088

Differential Revision: D16792312

fbshipit-source-id: 1386c369c96da04a584d1f7127b708b29d4b47d2
2019-08-13 14:35:19 -07:00
828c08b4c7 allow passing a list of operators to benchmark (#23442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23442

Replace the argument name from `operator` to `operators` which can take a list of operators to test.

Reviewed By: hl475

Differential Revision: D16520779

fbshipit-source-id: 94284a87c64471793e319f5bd3143f89b9a192bb
2019-07-26 12:20:36 -07:00
bae10db522 Incorporating arguments to pull production operators and adding device type. (#23197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23197

Incorporating arguments to pull production operators and adding device type.

Reviewed By: mingzhe09088

Differential Revision: D16387263

fbshipit-source-id: e20ed82225eb1e4b7ab1756ec157967b055d85bf
2019-07-23 13:43:26 -07:00
94d99f2522 add num_runs flag to the benchmark (#22892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22892

Think of num_runs as manually run the binary <num_runs> times. Each run runs the operator for many iterations.

Reviewed By: hl475

Differential Revision: D16271597

fbshipit-source-id: b6f509ee0332c70f85bec0d447b84940c5c0cecd
2019-07-15 17:18:25 -07:00
b93f29ded3 add JIT path to the benchmark (#22309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309

This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag.

In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT.  With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op.

Reviewed By: zheng-xq

Differential Revision: D16033082

fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1
2019-07-03 17:18:03 -07:00
a4f281446b introduce flags to set omp and mkl threads (#21472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21472

as title

Reviewed By: hl475

Differential Revision: D15695846

fbshipit-source-id: 44437f6b94a9c583275fcc711bb6ccf2b04f90fc
2019-06-26 09:33:05 -07:00
12528990f8 change output of ai_pep_format (#21440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21440

This diff modifies the output format when ai_pep_format is enabled.

Reviewed By: hl475

Differential Revision: D15681042

fbshipit-source-id: df5f2dbb38d1bd866ca7f74ef4e63459d480be6e
2019-06-05 21:54:24 -07:00
3004b397f0 change test_name to be globally unique value across tests (#21206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21206

This diff change the default test_name to be a globally unique value across tests. With that, users can list all the tests and choose to run a specific test.

Reviewed By: zheng-xq

Differential Revision: D15543508

fbshipit-source-id: 0814ef6a60d41637fed5245e30c282497cf21bb8
2019-06-03 14:55:11 -07:00
31089b02ce introduce a new interface to add op [core changes] (#21147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21147

This diff introduces a new interface to add PT/C2 operators to the benchmark suite.

The following steps are needed to add a new operator:
1. Specify the input shapes, args to an operator in configs
2. Create a PT/C2 benchmark class which includes ```init``` (create tensors),  ```forward``` (specify the operator to be tested.), and ```backward```(gradient of an op.) methods
3. call generate_pt_test/generate_c2_test to create test cases based on configs

Reviewed By: zheng-xq

Differential Revision: D15250380

fbshipit-source-id: 1025a7cf60d2427baa0f3f716455946d3d3e6a27
2019-05-31 09:21:04 -07:00
8c97f0b19e Initialize Caffe2 only when running Caffe2 benchmarks (#19980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19980
ghimport-source-id: ca31ca25b88a1c6219e4a32483f70738a8fdbf88

Differential Revision: D15229797

Pulled By: ilia-cher

fbshipit-source-id: 0b23dbdba0c0f60932a75d8b1900c54285f5a8e4
2019-05-06 19:17:23 -07:00
26f12af537 Fix op benchmarks error in OSS environment (#19518)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19518

Previous design needs to run the op benchmarks from PyTorch root directory which could lead to `module not found` error in OSS environment. This diff fixes that issue by making the benchmark to be launched in the `benchmarks` folder.

Reviewed By: ilia-cher

Differential Revision: D15020787

fbshipit-source-id: eb09814a33432a66cc857702bc86538cd17bea3b
2019-04-19 16:25:16 -07:00
08f5c05d60 make separate operators as independent binaries (#19450)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19450

We want to make each operator benchmark as a separate binary. The previous way to run the benchmark is by collecting all operators into a single binary, it is unnecessary when we want to filter a specific operator. This diff aims to resolve that issue.

Reviewed By: ilia-cher

Differential Revision: D14808159

fbshipit-source-id: 43cd25b219c6e358d0cd2a61463b34596bf3bfac
2019-04-18 20:00:47 -07:00
45d5b6be48 Enhance front-end to add op (#19433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19433

For operator benchmark project, we need to cover a lot of operators, so the interface for adding operators needs to be very clean and simple. This diff is implementing a new interface to add op.

Here is the logic to add new operator to the benchmark:
```
long_config = {}
short_config = {}

map_func

add_test(
  [long_config, short_config],
  map_func,
  [caffe2 op]
  [pt op]
)
```

Reviewed By: zheng-xq

Differential Revision: D14791191

fbshipit-source-id: ac6738507cf1b9d6013dc8e546a2022a9b177f05
2019-04-18 17:07:02 -07:00
cb66759600 temp fix for flake8 error (#18788)
Summary:
Fix lint error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18788

Reviewed By: houseroad

Differential Revision: D14741840

Pulled By: mingzhe09088

fbshipit-source-id: 1fa630e3c6e606e3d78fe8293e5b0e7ea1b78da3
2019-04-02 22:52:52 -07:00
5f5a2aaab9 Operator-level performance microbenchmarks (#18740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740

Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure

* benchmark_core.py : core utiltiites for running microbenchmark tests
* benchmark_caffe2.py : Caffe2 specific benchmark utilitites
* benchmark_pytorch.py: PyTorch specific benchmark utilities
* benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP.

The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend.

Include two operator microbenchmarks; support both Caffe2/PyTorch:
* MatMul
* Add

Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python. In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo.

Reviewed By: zheng-xq

Differential Revision: D13887111

fbshipit-source-id: b7a56b95448c9ec3e674b0de0ffb96af4439bfce
2019-04-02 17:06:19 -07:00