Commit Graph

8 Commits

Author SHA1 Message Date
c0ed38e644 [BE][Easy][3/19] enforce style for empty lines in import segments in benchmarks/ (#129754)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754
Approved by: https://github.com/ezyang
2024-07-17 14:34:42 +00:00
26f4f10ac8 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
55c0ab2887 Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)"
This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f.

Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))
2024-05-27 09:22:08 +00:00
7763c83af6 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
2024-05-27 04:22:18 +00:00
dd3a77bc96 Apply UFMT to all files in benchmarks/ (#105928)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928
Approved by: https://github.com/albanD
2023-07-26 01:18:48 +00:00
8ff0b6fef8 [OpBenchMobile] Enable operator_benchmark to run the benchmark on mobile through AiBench (#47767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47767

This diff implements the functionality of running benchmark on mobile on top of operator_benchmark framework. It does so through a few steps:

1. create a scripted module from existing benchmark case.
2. run mobile specific optimization pass on the scripted module
3. run the scripted module on AiBench by calling its Python API

A small change in the way of writing a benchmark case is introduced so that both local and mobile run can share the same interface. The change is about having inputs as arguments of the `forward` function, so that mobile optimization pass can be run successfully (otherwise everything will be optimized away by constant propagation).

Test Plan:
## local op_bench run

buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test --  --iterations 1 --warmup_iterations 1

buck run caffe2/benchmarks/operator_benchmark:benchmark_all_test --  --iterations 1 --warmup_iterations 1 --use_jit

Exceptions: `py_module` op in `FakeQuantizePerTensorBaseOpBenchmark` and `FakeQuantizePerChannelBaseOpBenchmark` under JIT mode. These tests also failed in the base version

```
RuntimeError:
Module 'FakeQuantizePerChannelOpBenchmark' has no attribute 'op_func' (This function exists as an attribute on the Python module, but we failed to compile it to a TorchScript function.
The error stack is reproduced here:

Python builtin <built-in method apply of FunctionMeta object at 0x619000c652a0> is currently not supported in Torchscript:
  File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 260
    quant_min: int, quant_max: int
):
    return _LearnableFakeQuantizePerChannelOp.apply(input, scale, zero_point, axis, quant_min, quant_max, 1.0)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
:
  File "/data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/quantization_test#link-tree/quantization_test.py", line 313
        axis: int, quant_min: int, quant_max: int
    ):
        return self.op_func(input, scale, zero_point, axis, quant_min, quant_max)
               ~~~~~~~~~~~~ <--- HERE
```

`_consume_op` typing mismatch: chunk, split, qobserver, sort in qunary. These will be fixed in D24774105

## OSS test

python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1 --use_jit
python3 -m benchmark_all_test --iterations 1 --warmup_iterations 1

## saved module graph
```
module __torch__.mobile_benchmark_utils.OpBenchmarkMobile {
  parameters {
  }
  attributes {
    training = True
    num_iters = 1
    benchmark = <__torch__.pt.add_test.___torch_mangle_4.AddBenchmark object at 0x6070001b8b50>
  }
  methods {
    method forward {
      graph(%self : __torch__.mobile_benchmark_utils.OpBenchmarkMobile):
        %12 : None = prim::Constant() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:9:4
        %4 : bool = prim::Constant[value=1]() # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8
        %1 : int = prim::GetAttr[name="num_iters"](%self)
         = prim::Loop(%1, %4) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/mobile_benchmark_utils.py:10:8
          block0(%i : int):
            %6 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self)
            %7 : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark = prim::GetAttr[name="benchmark"](%self)
            %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]()
            %9 : Tensor, %10 : Tensor = prim::TupleUnpack(%self.inputs_tuple)
            %23 : int = prim::Constant[value=1]()
            %24 : Tensor = aten::add(%9, %10, %23) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15
            -> (%4)
        return (%12)

    }
  }
  submodules {
    module __torch__.pt.add_test.___torch_mangle_4.AddBenchmark {
      parameters {
      }
      attributes {
        mobile_optimized = True
      }
      methods {
        method forward {
          graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark,
                %input_one.1 : Tensor,
                %input_two.1 : Tensor):
            %3 : int = prim::Constant[value=1]()
            %4 : Tensor = aten::add(%input_one.1, %input_two.1, %3) # /data/users/wangyang19/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/fb/pt/mobile/benchmark_all_test_fbcode#link-tree/pt/add_test.py:39:15
            return (%4)

        }
        method get_inputs {
          graph(%self : __torch__.pt.add_test.___torch_mangle_4.AddBenchmark):
            %self.inputs_tuple : (Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu), Float(1, 1, 1, strides=[1, 1, 1], requires_grad=0, device=cpu)) = prim::Constant[value=({0.48884}, {0.809042})]()
            return (%self.inputs_tuple)

        }
      }
      submodules {
      }
    }
  }
}

```

Reviewed By: kimishpatel

Differential Revision: D24322214

fbshipit-source-id: 335317eca4f40c4083883eb41dc47caf25cbdfd1
2020-11-12 17:15:05 -08:00
c9222b7471 Implement clip_ranges operator for PyTorch
Test Plan:
unit test for correctness
```
buck test caffe2/torch/fb/sparsenn:test -- test_clip_ranges
Parsing buck files: finished in 1.6 sec
Creating action graph: finished in 18.9 sec
Building: finished in 15.0 sec (100%) 9442/9442 jobs, 1 updated
  Total time: 35.6 sec
More details at https://www.internalfb.com/intern/buck/build/66fb17de-859e-4d01-89bf-5c5de2950693
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 80f5e0c2-7db2-48a4-b148-25dd34651682
Trace available for this run at /tmp/tpx-20201026-123217.050766/trace.log
Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599665041422
    ✓ ListingSuccess: caffe2/torch/fb/sparsenn:test - main (14.912)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (14.098)
Summary
  Pass: 1
  ListingSuccess: 1
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599665041422
```

new  benchmark perf test
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu
# Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 155.765

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu
# Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 156.248

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu
# Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 156.634

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu
# Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 155.408

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu
# Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 165.168
```

Compare with the old implementation, there are **around 300us gain**
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu
# Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 443.012

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu
# Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 446.480

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu
# Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 444.064

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu
# Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 445.511

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu
# Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 450.468
```

Reviewed By: MarcioPorto

Differential Revision: D24546110

fbshipit-source-id: e6c9b38e911f177f97961ede5bf375107f240363
2020-10-28 09:46:37 -07:00
c6858fd71a Set up benchmarks for ClipRanges operator for Caffe2 and PyTorch
Summary: As title, adding the benchmark tests for ClipRanges operators.

Test Plan:
benchmark test for Caffe2
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking Caffe2: clip_ranges
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1026 12:30:33.938997 2658759 init.h:137] Caffe2 GlobalInit should be run before any other API calls.
# Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypeint32
# Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: int32
Forward Execution Time (us) : 5.805

# Benchmarking Caffe2: clip_ranges
# Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypeint32
# Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: int32
Forward Execution Time (us) : 5.913

# Benchmarking Caffe2: clip_ranges
# Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypeint32
# Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: int32
Forward Execution Time (us) : 5.941

# Benchmarking Caffe2: clip_ranges
# Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypeint32
# Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: int32
Forward Execution Time (us) : 5.868

# Benchmarking Caffe2: clip_ranges
# Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypeint32
# Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: int32
Forward Execution Time (us) : 6.408
```

benchmark test for PyTorch
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu
# Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 443.012

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu
# Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 446.480

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu
# Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 444.064

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu
# Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 445.511

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu
# Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 450.468
```

Reviewed By: MarcioPorto

Differential Revision: D24500468

fbshipit-source-id: a582090a3982005af272cb10cdd257b2b2e787c4
2020-10-28 09:42:10 -07:00