211 Commits

Author SHA1 Message Date
3a0801f960 [skip ci] Fix "arugment" typos (#61459)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61455.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61459

Reviewed By: soulitzer

Differential Revision: D29636559

Pulled By: samestep

fbshipit-source-id: 9ad65265c0491d9e81bb303abe3a07c6843bfa4a
2021-07-15 15:20:18 -07:00
808d0e3353 [caffe2] update make_mnist_db and make_image_db to move strings into DB::Put() (#60919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60919

Update make_mnist_db.cc and make_image_db.cc to work with the DB API changes
in D29204425 (00896cb9ed).  This is similar to the changes to make_cifar_db.cc landed in
D29374754 (394f60b0fc).
ghstack-source-id: 132621346

Test Plan: buck build caffe2/binaries/...

Reviewed By: valmikir

Differential Revision: D29447314

fbshipit-source-id: 33aff85c24d8b785211287de23d46704c7eb0726
2021-06-29 11:52:43 -07:00
394f60b0fc [caffe2] update make_cifar_db to move the string into DB::Put() (#60692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60692

Update make_cifar_db.cc to work with the DB API changes in D29204425 (00896cb9ed).

Test Plan: buck build caffe2/binaries:make_cifar_db

Differential Revision: D29374754

fbshipit-source-id: 23d2acd24031d11071791e398433b537215ffd38
2021-06-25 14:02:24 -07:00
2e26976ad3 Disallow versionless Python shebangs (#58275)
Summary:
Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs.

I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275

Test Plan: CI.

Reviewed By: zhouzhuojie

Differential Revision: D28428143

Pulled By: samestep

fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf
2021-05-14 08:26:02 -07:00
6dd1978d4b print average duration for caffe2 benchmark
Summary: print average duration for caffe2 benchmark

Test Plan:
buck run //xplat/caffe2:caffe2_benchmarkAppleMac -- --init_net ~/track_init_net.pb --net ~/track_predict_net.pb --warmup 10 --input 'data' --input_dims '1,4,128,256' --input_type float --iter 20
Using additional configuration options from .buckconfig.local
Building: finished in 0.6 sec (100%) 247/2137 jobs, 0 updated
  Total time: 0.6 sec
Average Duration: 18111 us

Reviewed By: larryliu0820

Differential Revision: D27745416

fbshipit-source-id: a5d20b8ef0ba4a9547d396738d5ddd1aca57684d
2021-04-13 14:19:34 -07:00
85fcadc059 [lite-interpreter] speed_benchmark_torch support BUILD_LITE_INTERPRETER (#55402)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55402

Test Plan: Imported from OSS

Reviewed By: cccclai

Differential Revision: D27599824

Pulled By: IvanKobzarev

fbshipit-source-id: 3adbb8a16a785d3610404d71ef2d895904b1a8ef
2021-04-07 11:39:32 -07:00
24c904951c Replace AutoNonVariableTypeMode with InferenceMode in fbcode. (#55114)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55114

Test Plan: CI

Reviewed By: ezyang, bhosmer

Differential Revision: D27472768

fbshipit-source-id: 76f17ef7de40f6e04e2968f8958027b5f93e1c0c
2021-04-02 11:45:53 -07:00
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
c2558b4b61 [vulkan] Add nonVarTypeModeGuard to vulkan tests and speed_benchmark_torch (#52535)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52535

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D26580994

Pulled By: SS-JIA

fbshipit-source-id: 94f091432265cf6607b73c34846c07273d47c70b
2021-02-25 14:23:40 -08:00
22c6dafd33 [PyTorch] Use plain old function pointer for RecordFunctionCallback (reapply) (#49408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49408

Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback.
ghstack-source-id: 118665808

Test Plan:
Wait for GitHub CI since we had C++14-specific issues with
this one in previous PR https://github.com/pytorch/pytorch/pull/48629

Reviewed By: malfet

Differential Revision: D25563207

fbshipit-source-id: 6a2831205917d465f8248ca37429ba2428d5626d
2020-12-15 19:16:01 -08:00
25bc906281 Revert D25135415: [PyTorch] Use plain old function pointer for RecordFunctionCallback
Test Plan: revert-hammer

Differential Revision:
D25135415 (7e23ee1598)

Original commit changeset: 5e92dc79da64

fbshipit-source-id: 45b1634a100084c84dca158a1f16ca760fef6988
2020-12-14 21:04:27 -08:00
7e23ee1598 [PyTorch] Use plain old function pointer for RecordFunctionCallback (#48629)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48629

Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback.
ghstack-source-id: 118568240

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D25135415

fbshipit-source-id: 5e92dc79da6473ed15d1e381a21ed315879168f3
2020-12-14 20:08:16 -08:00
900aa4ee97 [PyTorch] remove convenience RecordFunctionCallback interface (#48620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48620

In preparation for storing bare function pointer (8 bytes)
instead of std::function (32 bytes).
ghstack-source-id: 118568242

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D25132183

fbshipit-source-id: 3790cfb5d98479a46cf665b14eb0041a872c13da
2020-12-14 20:03:15 -08:00
db5e5b439c Extra sampling of record function events [resend] (#49114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49114

resend of https://github.com/pytorch/pytorch/pull/48289

Test Plan: see 48289

Reviewed By: robieta

Differential Revision: D25443365

Pulled By: ilia-cher

fbshipit-source-id: c15ac312222bb4d744e10199ed79801cccae8227
2020-12-11 12:53:37 -08:00
9f7fb54693 Revert D25111515: Extra sampling of record function events
Test Plan: revert-hammer

Differential Revision:
D25111515 (09b974c2d5)

Original commit changeset: 0d572a3636fe

fbshipit-source-id: d558d8052924d937d86db7dd40dc6388e6d28823
2020-12-09 08:37:17 -08:00
09b974c2d5 Extra sampling of record function events (#48289)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48289

Adding extra sampling step when dispatching RecordFunction.

(Note: this ignores all push blocking failures!)

Reviewed By: swolchok

Differential Revision: D25111515

Pulled By: ilia-cher

fbshipit-source-id: 0d572a3636fe649a47ec47901826bbfc08368937
2020-12-09 02:29:13 -08:00
251398acca Force a sync on non-CPU tensors for the benchmark to reflect the timing accurately. (#48856)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48856

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D25339803

Pulled By: AshkanAliabadi

fbshipit-source-id: fdfd9a0e0cc37245d7671419f492e445396fbdb8
2020-12-05 10:47:44 -08:00
cc1c3063c5 Add test binary to compare torch model outputs (#47933)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47933

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D25309199

Pulled By: SS-JIA

fbshipit-source-id: adc3fc7db33c251f6b661916265b86b7b8c68fc2
2020-12-03 15:29:56 -08:00
8177f63c91 Reorganize and refine the Windows.h import in C++ files (#48009)
Summary:
This PR aims to reduce the import overhead and symbol noises from the `windows.h` headers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48009

Reviewed By: gchanan

Differential Revision: D25045840

Pulled By: ezyang

fbshipit-source-id: 01fda70f433ba2dd0cd2d7cd676ab6ffe9d98b90
2020-11-20 14:21:09 -08:00
faf03bd226 Update default ouput extension in optimize_for_mobile.cc (#45598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45598

.bc is causing issues on Android.  Let's switch to .ptl.

Test Plan: CI

Reviewed By: kimishpatel

Differential Revision: D24026180

fbshipit-source-id: 9f252f3652d748bccb19dc61a783d693e171b2c6
2020-10-15 15:34:34 -07:00
a277c097ac [iOS][GPU] Add Metal/MPSCNN support on iOS (#46112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112

### Summary

This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta.

allow-large-files

- Users API

```
  auto module = torch::jit::load(model);
  module.eval();
  at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal();
  auto output = module.forward({input}).toTensor().cpu();
```
- Supported Models
    - Person Segmentation v106 (FB Internal)
    - Mobilenetv2

- Supported Operators
    - aten::conv2d
    - aten::addmm
    - aten::add.Tensor
    - aten::sub.Tensor
    - aten::mul.Tensor
    - aten::relu
    - aten::hardtanh
    - aten::hardtanh_
    - aten::sigmoid
    - aten::max_pool2d
    - aten::adaptive_avg_pool2d
    - aten::reshape
    - aten::t
    - aten::view
    - aten::log_softmax.int
    - aten::upsample_nearest2d.vec

- Supported Devices
    - Apple A9 and above
    - iOS 10.2 and above

- CMake scripts
    - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON`

### Test Plan

- Circle CI

ghstack-source-id: 114155638

Test Plan:
1. Sandcastle CI
2. Circle CI

Reviewed By: dreiss

Differential Revision: D23236555

fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625
2020-10-13 01:46:56 -07:00
6e55a26e10 Move mobile specific CPUCachingAllocator to c10/mobile folder. (#45364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45364

Plus add some more comments about the usage, limitations and cons.

Test Plan: Build and run benchmark binary.

Reviewed By: gchanan

Differential Revision: D23944193

fbshipit-source-id: 30d4f4991d2185a0ab768d94c846d73730fc0835
2020-09-29 11:33:26 -07:00
35596d39e9 Coalesce TLS accesses in RecordFunction constructor (#44970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44970

Right now, when RecordFunction is not active (usual case),
we do two TLS accesses (check for thread local callbacks, and check for
thread local boolean).
Experimenting with reducing number of TLS accesses in RecordFunction
constructor.

Test Plan: record_function_benchmark

Reviewed By: dzhulgakov

Differential Revision: D23791165

Pulled By: ilia-cher

fbshipit-source-id: 6137ce4bface46f540ece325df9864fdde50e0a4
2020-09-28 21:42:23 -07:00
a4aba1d465 fix compile error (#45052)
Summary:
Update vulkanOptimizeForMobile function invoking in optimize_for_mobile.cc to align latest call contract in PR https://github.com/pytorch/pytorch/pull/44903.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45052

Reviewed By: malfet

Differential Revision: D23814953

Pulled By: mrshenli

fbshipit-source-id: 0fa844a8291e952715b9de35cdec0e411c42b7f9
2020-09-21 10:23:49 -07:00
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
bff741a849 Improve save_for_mobile cxx binary (#43721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43721

We can combine optimization pass and save_for_mobile together to reduce friction. Since lite interpreter model can also be used in full JIT, I don't think we need the option to save it as full JIT model.

Also
- improved usage message
- print op list before and after optimization pass

Test Plan:
```
buck run //xplat/caffe2:optimize_for_mobile -- --model=/home/linbin/sparkspot.pt

Building: finished in 12.4 sec (100%) 2597/2597 jobs, 2 updated
  Total time: 12.5 sec

pt_operator_library(
        name = "old_op_library",
        ops = [
                "aten::_convolution",
                "aten::adaptive_avg_pool2d",
                "aten::add_.Tensor",
                "aten::batch_norm",
                "aten::mul.Tensor",
                "aten::relu_",
                "aten::softplus",
                "aten::sub.Tensor",
        ],
)

pt_operator_library(
        name = "new_op_library",
        ops = [
                "aten::adaptive_avg_pool2d",
                "aten::add_.Tensor",
                "aten::batch_norm",
                "aten::mul.Tensor",
                "aten::relu_",
                "aten::softplus",
                "aten::sub.Tensor",
                "prepacked::conv2d_clamp_run",
        ],
)

The optimized model for lite interpreter was saved to /home/linbin/sparkspot_mobile_optimized.bc
```

```
buck run //xplat/caffe2:optimize_for_mobile -- --model=/home/linbin/sparkspot.pt --backend=vulkan
```

Reviewed By: kimishpatel

Differential Revision: D23363533

fbshipit-source-id: f7fd61aaeda5944de5bf198e7f93cacf8368babd
2020-08-27 11:01:12 -07:00
2a08566b8f Simple caching allocator for CPU. (#42006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42006

This PR introduces a simple CPU caching allocator. This is specifically
intended for mobile use cases and for inference. There is nothing
specific to the implementation that can prevent it from other use cases,
however its simplicity may not be suitable everywhere.
It simply tracks allocation by sizes and relies on deterministic
repeatable behavior where allocation of same sizes are made on every
inference.
Thus after the first allocation when the pointer is returned, instead of
returning it to system, allocator caches it for subsequent use.
Memory is freed automatically at the end of the process, or it can be
explicitly freed.
This is enabled at the moment in DefaultMobileCPUAllocator only.

Test Plan:
android test: cpu_caching_allocator_test

Imported from OSS

Reviewed By: dreiss

Differential Revision: D22726976

fbshipit-source-id: 9a38b1ce34059d5653040a1c3d035bfc97609e6c
2020-08-21 19:09:22 -07:00
8e0714a60d [rfc] Reduce number of coin flips in RecordFunction (#40758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40758

Currently we flip a coin for each sampled callback each time
we run RecordFunction, this PR is an attempt to skip most of the coin
flips (for the low-probability observers) and keep the distribution
close to the original one

Test Plan:
CI and record_function_benchmark
```
(python_venv) iliacher@devgpu151:~/local/pytorch  (reduce_coin_flops)$ ./build/bin/record_function_benchmark
Warmup time: 30108 us.
Time per iteration (1x1): 1496.78 us.
Time per iteration (16x16): 2142.46 us.
Pure RecordFunction runtime of 10000000 iterations 687929 us, number of callback invocations: 978
(python_venv) iliacher@devgpu151:~/local/pytorch  (reduce_coin_flops)$ ./build/bin/record_function_benchmark
Warmup time: 19051 us.
Time per iteration (1x1): 1581.89 us.
Time per iteration (16x16): 2195.67 us.
Pure RecordFunction runtime of 10000000 iterations 682402 us, number of callback invocations: 1023
(python_venv) iliacher@devgpu151:~/local/pytorch  (reduce_coin_flops)$ ./build/bin/record_function_benchmark
Warmup time: 18715 us.
Time per iteration (1x1): 1566.11 us.
Time per iteration (16x16): 2131.17 us.
Pure RecordFunction runtime of 10000000 iterations 693571 us, number of callback invocations: 963
(python_venv) iliacher@devgpu151:~/local/pytorch  (reduce_coin_flops)$

(python_venv) iliacher@devgpu151:~/local/pytorch  (reduce_coin_flops)$ ./build/bin/record_function_benchmark
Warmup time: 18814 us.
Time per iteration (1x1): 1536.2 us.
Time per iteration (16x16): 1985.82 us.
Pure RecordFunction runtime of 10000000 iterations 944959 us, number of callback invocations: 1015
(python_venv) iliacher@devgpu151:~/local/pytorch  (reduce_coin_flops)$ ./build/bin/record_function_benchmark
Warmup time: 18278 us.
Time per iteration (1x1): 1526.32 us.
Time per iteration (16x16): 2093.77 us.
Pure RecordFunction runtime of 10000000 iterations 985307 us, number of callback invocations: 1013
(python_venv) iliacher@devgpu151:~/local/pytorch  (reduce_coin_flops)$ ./build/bin/record_function_benchmark
Warmup time: 18545 us.
Time per iteration (1x1): 1524.65 us.
Time per iteration (16x16): 2080 us.
Pure RecordFunction runtime of 10000000 iterations 952835 us, number of callback invocations: 1048
```

Reviewed By: dzhulgakov

Differential Revision: D22320879

Pulled By: ilia-cher

fbshipit-source-id: 2193f07d2f7625814fe7bc3cc85ba4092fe036bc
2020-06-30 17:23:00 -07:00
3852215170 [vulkan] jit passes for vulkan conv2 prepack and fuse with clamp (#39282)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39282

Test Plan: Imported from OSS

Differential Revision: D21962424

Pulled By: IvanKobzarev

fbshipit-source-id: 2d20e827d2c3836b7e6b443293377c68dc1ffa5a
2020-06-20 14:12:21 -07:00
4b028a8e07 [jit] support pad_sequence/pack_sequence (#39844)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39844

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D22026720

Pulled By: wanchaol

fbshipit-source-id: cc51ea77eff3689e319ec7e89a54c788646b5940
2020-06-19 19:03:14 -07:00
0b3755b1d0 Add optimization blacklist as second arg to optimizeForMobile method. (#37462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37462

Instead of running all the optimization pass in optimizeForMobile method,
introducing a whitelist optimizer dictionary as second param in the method,
when it is not passed during calling, the method will run all the optimization
passes, otherwise the method will read the dict and only run the pass with
value of True.
ghstack-source-id: 106104503

Test Plan:
python test/test_mobile_optimizer.py

Imported from OSS

Differential Revision: D22096029

fbshipit-source-id: daa9370c0510930f4c032328b225df0bcf97880f
2020-06-17 18:14:45 -07:00
42f0ea49ca [Codemod][GleanFbcode] Remove dead includes in caffe2/binaries
Reviewed By: ilia-cher

Differential Revision: D21949969

fbshipit-source-id: 80336f82e9507dd001d079644cba5012bc5c8eed
2020-06-15 12:16:52 -07:00
e399e470b6 [vulkan] speed_becnhmark_torch add vulkan arg to use Vulkan backend (#39076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39076

`--vulkan` argument to use torch benchmark on Vulkan Backend
if it is True - inputs will be converted to Vulkan backend before module.forward

Usage for mobilenetv2 fp32:
```
/build/bin/speed_benchmark_torch --model=mn-fp32.pt --input_type=float --input_dims=1,3,224,224 --warmup=1 --iter=5 --vulkan=true
```

Test Plan: Imported from OSS

Differential Revision: D21962428

Pulled By: IvanKobzarev

fbshipit-source-id: 3136af5386b6bce9ea53ba4a9019af2d312544b3
2020-06-10 22:19:22 -07:00
2d708cefcc Move RecordFunction into ATen (#37548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548

Moving RecordFunction from torch::autograd::profiler into at namespace

Test Plan:
CI

Imported from OSS

Differential Revision: D21315852

fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa
2020-05-07 14:52:39 -07:00
c24c5f9684 Make RecordFunction callbacks thread local and modernize interface (#37491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37491

This PR modernizes RecordFunction API and adds thread local callbacks
in addition to the global ones

Changes:
 - support for TLS callbacks, this is going to be the foundation of profiler and other tools
 - modernize interface around simple set of functions (add|remove|has|clear)(Global|ThreadLocal)(Callback) and adding RecordFunctionCallback to easily construct callbacks to be passed
 - we also add `.setShouldRun` into the callback interface to support cases when simple uniform sampling is not enough
 - to properly support add/remove introduce the idea of callback handle returned by add
 - internal implementation still uses SmallVector to store intermediate state (as before) - in this case these are vector of handles of callbacks that were picked to run
 - to speed up runtime we keep these vectors sorted, this way we can quickly enumerate callbacks that need to be run
 - added tests for new functionality

Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install
./build/bin/test_jit
CI

record_function_benchmark: https://gist.github.com/ilia-cher/f1e094dae47fe23e55e7672ac4dcda2f

Imported from OSS

Differential Revision: D21300448

fbshipit-source-id: 6d55c26dbf20b33d35c3f1604dcc07bb063c8c43
2020-05-07 14:51:02 -07:00
dd64d26d74 Make speed_benchmark_torch report latency in us (#37953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37953

Earlier it said us but reported ms.

Test Plan: buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --devices s9u --remote --framework pytorch --logger_level info --job_queue aibench_interactive --platform android/full_jit

Reviewed By: xcheng16

Differential Revision: D21349612

fbshipit-source-id: b97b6216eb0264123ff2c7852a0678b2008b0bf1
2020-05-07 11:08:14 -07:00
d4edbbd396 Revert D21369541: Make a separate cmake option for caffe2 tests
Test Plan: revert-hammer

Differential Revision:
D21369541

Original commit changeset: 669cff70c5b5

fbshipit-source-id: 500d261eaf3f02bcd698d343480b9e951e2844b9
2020-05-05 06:30:52 -07:00
aff92ef3d6 Make a separate cmake option for caffe2 tests (#37721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37721

Even though we disabled caffe2 test configs in Python, the BUILD_TEST
option was still building caffe2 test cpp binaries and various CI
configurations were running them (since they just run every binary in
`torch/test`).

This PR adds a caffe2-specific BUILD_TEST option (BUILD_CAFFE2_TEST),
which defaults to OFF, and gates the compilation of caffe2 test cpp
binaries under it.

Test Plan: Imported from OSS

Differential Revision: D21369541

Pulled By: suo

fbshipit-source-id: 669cff70c5b53f016e8e016bcb3a99bf3617e1f9
2020-05-04 23:26:27 -07:00
9f02897431 Account for the change in optimizeForMobile API change.
Test Plan: TBD

Reviewed By: ayush29feb

Differential Revision: D21185736

fbshipit-source-id: fc7abc9c2eba8e6a390e54168b1fc4a17bf80e68
2020-04-24 13:21:56 -07:00
c4b9f3bf55 Enable torch_speed_benchmark to accept different memory formats. (#36202)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36202

Test Plan: Imported from OSS

Differential Revision: D20970216

Pulled By: AshkanAliabadi

fbshipit-source-id: bb5a260e5677716356eec6ad4daa1f3c65420bbd
2020-04-23 13:18:43 -07:00
1c15cb4773 Add bundled input support to speed_benchmark_torch (#36765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36765

We recently added support for bundling inputs with models.  Now add
support to the benchmarker to use those inputs.  This frees users from
having to look up the proper input format for each model.

Test Plan:
- Ran on a model without bundled inputs.  Saw a clear error.
- Ran on a model with too few bundled inputs.  Saw a clear error.
- Ran on a proper bundled input.  Model executed.

Differential Revision: D21142659

Pulled By: dreiss

fbshipit-source-id: d23c1eb9d1de882345b007bf2bfbbbd6f964f6fe
2020-04-20 15:32:57 -07:00
7374a00bef [pt]Supported benchmarking pytorch jit self-contained models. (#35279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35279

Supported benchmarking pytorch jit self-contained models.
* By specifying flag `--no_inputs=True`, the binary supports benchmarking self-contained torchscript model (model runs without inputs, `model.forward()`)
* This allows moving data preparation part outside of this binary.

Reviewed By: kimishpatel

Differential Revision: D20585639

fbshipit-source-id: c28e50503534c90023c1430479d26f1c1ce740b1
2020-04-09 17:02:17 -07:00
c5c63a2e35 Add quick utility to transform scripted/traced models for mobile. (#35904)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35904

Currently this optimization means transform conv2d and linears to
prepacked(xnnpack) equivalent.

Test Plan: buck run fbsource//xplat/caffe2:optimize_for_mobile -- --model="/tmp/inpainting_fbnet.pt"

Reviewed By: AshkanAliabadi

Differential Revision: D20824433

fbshipit-source-id: 88d5c0d21b77911f95f018b03398b0df758ab0d7
2020-04-03 11:42:11 -07:00
c3abcf83aa [AI Bench] Resumme speed_benchmark_torch.cc to origin
Summary: we removed all assistant specific code

Test Plan:
```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices  SM-G950U-7.0-24
```

https://our.intern.facebook.com/intern/aibench/details/940147322057842

Reviewed By: kimishpatel

Differential Revision: D20686220

fbshipit-source-id: b7336d5ea15fa11be01abf4ad12747feaaf22ea8
2020-04-02 08:35:46 -07:00
800d5617c0 Recording of TorchScript functions (#34710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710

Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate.

Test Plan: unit test (test_misc.cpp/testRecordFunction)

Reviewed By: gdankel, dzhulgakov

Differential Revision: D20158523

fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582
2020-03-31 00:33:23 -07:00
b9adbb5002 Fix/relax CMake linter rules (#35574)
Summary:
Ignore mixed upper-case/lower-case style for now
Fix space between function and its arguments violation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574

Test Plan: CI

Differential Revision: D20712969

Pulled By: malfet

fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78
2020-03-27 16:52:33 -07:00
3789db40f2 [aibench] added support for measuring memory on AI Bench for Caffe2 Models (#35036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35036

Exposing the helper functions in benchmark_helper.h

Reviewed By: kimishpatel, geof90

Differential Revision: D20528983

fbshipit-source-id: 73231becd93b1e700d37af425bebb628890dec9a
2020-03-25 01:58:18 -07:00
6e47e7bf52 [pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites
Summary:
There are three guards related to mobile build:
* AutoGradMode
* AutoNonVariableTypeMode
* GraphOptimizerEnabledGuard

Today we need set some of these guards before calling libtorch APIs because we customized mobile build to only support inference (for both OSS and most FB use cases) to optimize binary size.

Several changes were made since 1.3 release so there are already inconsistent uses of these guards in the codebase. I did a sweep of all mobile related model loading & forward() call sites, trying to unify the use of these guards:

Full JIT: still set all three guards. More specifically:
* OSS: Fixed a bug of not setting the guard at model load time correctly in Android JNI.
* FB: Not covered by this diff (as we are using mobile interpreter for most internal builds).

Lite JIT (mobile interpreter): only needs AutoNonVariableTypeMode guard. AutoGradMode doesn't seem to be relevant (so removed from a few places) and GraphOptimizerEnabledGuard definitely not relevant (only full JIT has graph optimizer). More specifically:
* OSS: At this point we are not committed to support Lite-JIT. For Android it shares the same code with FB JNI callsites.
* FB:
** JNI callsites: Use the unified LiteJITCallGuard.
** For iOS/C++: manually set AutoNonVariableTypeMode for _load_for_mobile() & forward() callsites.

Ideally we should avoid having to set AutoNonVariableTypeMode for mobile interpreter. It's currently needed for dynamic dispatch + inference-only mobile build (where variable kernels are not registered) - without the guard it will try to run `variable_fallback_kernel` and crash (PR #34038). The proper fix will take some time so using this workaround to unblock selective BUCK build which depends on dynamic dispatch.

PS. The current status (of having to set AutoNonVariableTypeMode) should not block running FL model + mobile interpreter - if all necessary variable kernels are registered then it can call _load_for_mobile()/forward() against the FL model without setting the AutoNonVariableTypeMode guard. It's still inconvenient for JAVA callsites as it's set unconditionally inside JNI methods.

Test Plan: - CI

Reviewed By: xta0

Differential Revision: D20498017

fbshipit-source-id: ba6740f66839a61790873df46e8e66e4e141c728
2020-03-18 17:19:35 -07:00
c235be42dd [jit] kill script namespace (#34515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515

Once upon a time we thought this was necessary. In reality it is not, so
removing it.

For backcompat, our public interface (defined in `api/`) still has
typedefs to the old `script::` names.

There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph
transform. I renamed one of them.

Test Plan: Imported from OSS

Differential Revision: D20353503

Pulled By: suo

fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93
2020-03-11 23:32:48 -07:00
25e4e9eb86 [On-device Benchmark] speed_benchmark_torch switch to log latency from dataset level to row level (#34598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34598

as above

Test Plan:
test.txt
```
what time is it now
could you set a reminder at 7 am
waht is the weather today
```
example json
```
{
    "model": {
      "category": "CNN",
      "description": "Assistant Mobile Inference",
      "files": {
        "model": {
          "filename": "model.pt1",
          "location": "//everstore/GICWmAB2Znbi_mAAAB0P51IPW8UrbllgAAAP/model.pt1",
          "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3"
        },
        "data": {
          "filename": "input.txt",
          "location": "/home/pengxia/test/input.txt",
          "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3"
        }
      },
      "format": "pytorch",
      "framework": "pytorch",
      "kind": "deployment",
      "name": "Assistant Mobile Inference"
    },
    "tests": [
      {
        "command": "{program} --model {files.model}  --input_dims \"1\" --input_type NLUType --warmup {warmup} --iter 5 --input_file {files.data} --report_pep true",
        "identifier": "{ID}",
        "metric": "delay",
        "iter": 15,
        "warmup": 2,
        "log_output": true
      }
    ]
  }

```

iter = 5 (--iter 5 ) *3(3 lintes in the test.txt)  = 15

arbabu123 I will provide a wrapper to compute the iter in future.

run following command
```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/assistant_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices  SM-G960U-8.0.0-26
```

results
https://our.intern.facebook.com/intern/aibench/details/275259559594003

**Note: this is compatible with the existing examples.**

Reviewed By: kimishpatel, ljk53

Differential Revision: D20389285

fbshipit-source-id: 80165ef394439a307ac7986cf540a80fdf3d85d6
2020-03-11 13:51:42 -07:00