Commit Graph

73 Commits

Author SHA1 Message Date
cyy
8967d55b01 [18/N] Fix clang-tidy warnings in jit (#132963)
Follows #132753

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132963
Approved by: https://github.com/Skylion007
2024-08-09 01:27:32 +00:00
cyy
29861779ce [2/N] Change #include <c10/util/Optional.h> to #include <optional> (#130236)
Follows  #128301. The changes were made by grep and sed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130236
Approved by: https://github.com/ezyang
2024-07-09 03:17:24 +00:00
cyy
f4dcf2ae93 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang, https://github.com/r-barnes
2024-07-08 07:03:53 +00:00
846bb30e13 Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)"
This reverts commit bd72e28314d8d63bb347becb8309f5ac7761c6b5.

Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build bd72e28314. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))
2024-06-15 01:58:20 +00:00
cyy
bd72e28314 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang
2024-06-14 23:21:01 +00:00
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
a229e78544 [BE] Enforce sign-compare (#96723)
Number of OSS PR were reverted, because new signed-unsigned comparison warnings, which are treated as errors in some internal builds.
Not sure how those selective rules are applied, but this PR removes `-Wno-sign-compare` from PyTorch codebase.

The only tricky part in this PR, as making sure that non-ASCII character detection works for both signed and unsigned chars  here:
6e3d51b08a/torch/csrc/jit/serialization/python_print.cpp (L926)

Exclude several files from sign-compare if flash attention is used, due to the violation in cutlass, to be fixed by https://github.com/NVIDIA/cutlass/pull/869
Do not try to fix sign compare violations in caffe2 codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96723
Approved by: https://github.com/albanD
2023-03-15 06:04:20 +00:00
579ae64d81 [mobile] List all missing ops at once (#94205)
List all missing ops rather than early termination

Test on device
Logcat lists all operators:
```
12-06 00:23:36.523  8299  8299 F DEBUG   : Abort message: 'terminating with uncaught exception of type c10::Error: Following ops cannot be found: [aten::max_pool2d, aten::conv2d]. Please check if the operator library is included in the build. If built with selected ops, check if these ops are in the list. If you are a Meta employee, please see fburl.com/missing_ops for a fix. Or post it in https://discuss.pytorch.org/c/mobile/ ()
12-06 00:23:36.523  8299  8299 F DEBUG   : Exception raised from initialize_operators at xplat/caffe2/torch/csrc/jit/mobile/function.cpp:89 (most recent call first):
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94205
Approved by: https://github.com/JacobSzwejbka
2023-02-07 05:45:57 +00:00
3916d7a575 Apply modernize-use-emplace to aten, c10, torch (#91077)
Apply clang-tidy check modernize-use-emplace. This is slightly more efficient by using an inplace constructor and is the recommended style in parts of the codebase covered by clang-tidy. This just manually applies the check to rest of the codebase. Pinging @ezyang as this is related to my other PRs he reviewed like #89000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91077
Approved by: https://github.com/ezyang
2022-12-19 07:49:56 +00:00
577e90ae9b Improve error message for missing ops (#80005)
The current error message is ill formed. Example

error: Following ops cannot be found. Please check if the operator library is included in the build. If built with selected ops, check if these ops are in the list. If you are a Meta employee, please see fburl.com/missing_ops for a fix. Or post it in https://discuss.pytorch.org/aten::to.prim_dtype ()

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80005
Approved by: https://github.com/cccclai
2022-06-23 09:04:34 +00:00
f96d96a7fc turn on -Werror=type-limits in our Bazel CPU build
Summary:
We also fix any existing issues.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79139

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 10:04:08 +00:00
c3e089a047 Revert "[mobile] Fix lightweight dispatch OOM error by introducing selective build"
This reverts commit 272bdb1442ee3750861d9f2f10690cc3f1521b92.

Reverted https://github.com/pytorch/pytorch/pull/78983 on behalf of https://github.com/osalpekar due to broke internal mobile tests
2022-06-09 05:16:42 +00:00
272bdb1442 [mobile] Fix lightweight dispatch OOM error by introducing selective build
This PR introduces selective build to lightweight dispatch CI job. By doing so we can't run the `test_lite_intepreter_runtime` test suite anymore because it requires some other operators.

From now on, if we are adding a new unit test in `test_codegen_unboxing`, we will have to export the operators for the unit test model and add them into `lightweight_dispatch_ops.yaml`. This can be automated by introducing tracing based selective build, but that's for next PR to do.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78983

Approved by: https://github.com/kit1980
2022-06-08 04:29:35 +00:00
3822a472ef Python function to extract information on mobile::Module from flatbuffer (#77624)
Summary:
Includes following refactor:
1. common loading on operator validation that is dup'd in pickle and
   flatbuffer loader moved to function.h/cpp
2. Allow loading of a function without wiring operator.

This function will be used to implement get_bundled_input and friends
for flatbuffer.

Test Plan: contbuild & OSS CI, see 69fa49f123

Reviewed By: cccclai

Differential Revision: D36348549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77624
Approved by: https://github.com/cccclai
2022-05-18 00:42:57 +00:00
5e3e5a5403 Revert "Python function to extract information on mobile::Module from flatbuffer (#77328)"
This reverts commit 69fa49f1230f80d1a0667e0a6ac8aca2746431b6.

Reverted https://github.com/pytorch/pytorch/pull/77328 on behalf of https://github.com/atalman
2022-05-17 01:35:05 +00:00
69fa49f123 Python function to extract information on mobile::Module from flatbuffer (#77328)
Includes following refactor:
1. common loading on operator validation that is dup'd in pickle and
   flatbuffer loader moved to function.h/cpp
2. Allow loading of a function without wiring operator.

This function will be used to implement get_bundled_input and friends
for flatbuffer.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77328
Approved by: https://github.com/cccclai
2022-05-16 16:58:43 +00:00
00c1e01ad0 Remove internal logic to handle bytecode version 3 (#57775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57775

The minimum supported bytecode version is updated from 3 to 4. We no longer support version 3 bytecode models.

Why?
* There are hacky codes in operator loading, that performs differently on one operator on the global bytecode version 3. Instead operator related metadata should be passed (for example, in #56845). To allow future development, we remove the hacky way first.
* The bytecode version was bumped from 3 to 4 more than half a year ago. Since all the production models are all bumped to version 4, it's not practical to keep and maintain version 3. The risk to deprecate version 3 is low.

Test Plan: Imported from OSS

Reviewed By: raziel

Differential Revision: D28270791

Pulled By: cccclai

fbshipit-source-id: 70b1bd6352fdaae5f8d2173b81578d77018c8e44
(cherry picked from commit 3e930fa381cd01f3705116795c6426df992372fc)
2022-04-07 01:45:52 +00:00
76df91215f [Pytorch Edge] Caffe2 Serialize files into indepedent target. Clean up function.cpp deps
Summary:
It seemed strange to me that min_runtime_lib was dependent on the serialization headers but didnt have a dependency on their .cc. This puts them into their own target that contains both and then updates deps.

(Note: this ignores all push blocking failures!)

Test Plan: ci

Reviewed By: iseeyuan

Differential Revision: D34159900

fbshipit-source-id: 57102414be2439f5f4e3ed8ccd2b0c375b9de9b2
(cherry picked from commit c9ff2d2d9df72ad6a990986340b7934c8929861b)
2022-02-16 17:22:24 +00:00
b12ca69179 [jit][edge] Migrate DictType to DynamicType on mobile. (#70202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70202

Use DynamicType instead of DictType all over the place in Lite Interpreter. Namely we need to modify the following places:
1. Type parser which produces the Type constants.
2. IValue::type() which returns reflected Type from IValues.
3. Helper functions to construct the container value.
4. Typechecks which test whether a type instance is a particular container type.
ghstack-source-id: 146735648

Test Plan: no behavior change.

Reviewed By: iseeyuan

Differential Revision: D33137257

fbshipit-source-id: 971bf431658c422ea9353cc32cdab66e98876e9d
2022-01-10 15:55:29 -08:00
d459e79500 [jit][edge] Remove usage of shared_ptr<mobile::Code>. (#68037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037

Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android.
ghstack-source-id: 145818696

Test Plan: eyes.

Reviewed By: qihqi, tugsbayasgalan

Differential Revision: D32264616

fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a
2021-12-16 13:11:46 -08:00
13faaff54c [Operator Versioning][Edge] Implement register function for upgrader (#67730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67730

This pr implement the register function for upgrader so it can be used at loading stage
ghstack-source-id: 145170986

Test Plan:
```
buck test //caffe2/test/cpp/jit:jit
```

Reviewed By: iseeyuan

Differential Revision: D32092518

fbshipit-source-id: 779b51eb12b8cb162a93a55c1e66fe0becc4cb36
2021-12-09 02:18:09 -08:00
959cb03132 Populate operator_input_sizes_ (#68542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68542

title

Test Plan: unittest

Reviewed By: iseeyuan

Differential Revision: D32508159

fbshipit-source-id: 0773a725973a493f19a2e9a340365e559dfdf7f8
2021-11-23 12:18:06 -08:00
4eb772fde6 Refactor saving jit::Module to mobile .pt in 2 steps: (#66494)
Summary:
1. is to convert Function -> mobile::Function
2. is to serialize mobile::Function

This also opens opportunity to create mobile::Module without saving/reloading

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494

Reviewed By: zhxchen17

Differential Revision: D32293022

Pulled By: qihqi

fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d
2021-11-17 12:02:20 -08:00
3e218dbd27 [PyTorch] Capture function args from schema by reference (#65951)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65951

Profiling shows that we do a bunch of heap allocations to copy Argument structs in append_operator. Capturing by reference here should be safe as long as the schema objects args is from outlive the operator function.

IMPORTANT: Reviewers (or automated tests if we're lucky) need to
confirm that the above is true or we're going to have fun
use-after-free bugs.
ghstack-source-id: 142065422

Test Plan:
AIBench run for speech model on MilanBoard

control: https://www.internalfb.com/intern/aibench/details/485570882988661 (mean 906 ms)
test: https://our.intern.facebook.com/intern/aibench/details/620835625995669 (mean 818 ms)

So almost a 10% improvement in the wall time metric?

Reviewed By: iseeyuan

Differential Revision: D31319988

fbshipit-source-id: 7da56357420df500df344f49007e070ebb1bc581
2021-11-02 11:12:04 -07:00
33d62266f2 [PyTorch][easy] Avoid allocating OperatorName strings in append_operator (#66134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66134

No reason to do the comparison the old way when we could do it this way and avoid copying into std::string.
ghstack-source-id: 142065423

Test Plan: AIBench Milan run shows neutral to slight regression, but I think we should probably just make this change anyway.

Reviewed By: dhruvbird

Differential Revision: D31319669

fbshipit-source-id: dde329a4f2c4054f275eb98fb6556f5341e7533a
2021-11-02 11:10:52 -07:00
12ede84dbb [jit][edge] Enable lite interpreter to correctly handle INTERFACE_CALL instruction. (#65972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65972

ghstack-source-id: 141842336

Test Plan: buck test mode/dev //caffe2/test:mobile -- --exact 'caffe2/test:mobile - test_stacktrace_interface_call (mobile.test_lite_script_module.TestLiteScriptModule)'

Reviewed By: qihqi

Differential Revision: D31326147

fbshipit-source-id: 338ff4ce8ddc9502ffe0add49057b33b52a24955
2021-10-29 13:13:32 -07:00
60472594e1 [jit][edge] Implement torch::jit::Function for mobile funciton. (#65970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65970

ghstack-source-id: 141842338

mobile::Function should inherit from jit::Function, because for interface call support, we need an abstract jit::Function type stored in corresponding ClassTypes, so that we can look up methods in there. Previously mobile::Function is implemented separately which prevents this. Since we get rid of all the unneeded virtual methods from jit::Function, we can inherit from torch::jit::Function without too much cost.

NOTE that torch::jit::Function is already in dependency because we need it to support custom class call. We should be able to use Function uniformly without looking into whether it's a builtin function or mobile::Function.

Test Plan: no behavior change.

Reviewed By: iseeyuan, mrshenli

Differential Revision: D31326148

fbshipit-source-id: 36caeaf3c8c5f54c23a1a7c8c9e2fd6e78b19622
2021-10-28 13:33:30 -07:00
12daa4f663 [jit][edge] Enable CALL instruction in lite interpreter. (#65964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65964

ghstack-source-id: 141425519

Test Plan: buck run xplat/caffe2:test_lite_interpreter

Reviewed By: cccclai

Differential Revision: D31326149

fbshipit-source-id: 8a599d92f3fa4e6c125100adb36d89592e71e547
2021-10-25 14:44:33 -07:00
4dce051cb0 [jit][edge] Add control stack frame to lite interpreter (#65963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65963

ghstack-source-id: 141425517

Test Plan: In next diff.

Reviewed By: qihqi, cccclai

Differential Revision: D31326150

fbshipit-source-id: dbbf65f2bf14846c45d0add71edc7d4dbfc6b92c
2021-10-25 12:15:16 -07:00
64caee1356 [PyTorch Edge] Leave out field for debug_handle if not being built with eager symbolication support (#66131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66131

Turns out that a model with 72k instructions causes about 0.5MiB of additional memory overhead (if there's an 8 byte memory overhead per instruction). This is not necessary if we're building w/o eager symbolication support. This change eliminates the 8 byte `debug_handle` if the build is w/o eager symbolication support.
ghstack-source-id: 140045478

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck build -c "pt.enable_eager_symbolication"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor
buck build //xplat/caffe2/fb/lite_predictor:lite_predictor
```

Reviewed By: kimishpatel

Differential Revision: D31387784

fbshipit-source-id: af56787ad833b990a46b79ab021e512edaa22143
2021-10-07 20:01:18 -07:00
fc4836f400 [Fix] Use full name to look for the promoted prim operator table (#66081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66081

Two fixes:

1. Since the operators are always registered with both name and overload name, the overloaded name need to be included when looking for an operator.
2. Don't promote operators with alias, because the new registry does not support schema with alias.

ghstack-source-id: 139732099

Test Plan: CI

Reviewed By: pavithranrao

Differential Revision: D31382262

fbshipit-source-id: 43c6e6e0c13950a9ce8cf3a70debe0421372d053
2021-10-06 15:35:02 -07:00
3c003aa6ae [PyTorchEdge] promote prim ops by using ops table for mobile runtime (#64816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64816

## Context:
Promoting prim ops:
Certain prim ops are frequent than others (like tupleIndex, raiseException, ...). These ops are frequent that they are chosen to be promoted as first class instructions. To promote it requires multiple steps and support from TS team as it changes how the bytecode is serialized and deserialized. So to prevent multiple bytecode version bumps and provided stability while these changes happen, an iterim iterative process is proposed which uses a table to lookup for "promoted" op's function. This allows us to rapidly update the ops list and test on production model without having to change the bytecode. In case of failure, we can quickly revert this change.

## Observation
The ops are chosen based on the notebook N1135657 which examines the top frequent ops.

## Fix
An iterim solution of having a static table, which when given a prim op name returns a function to be applied on the stack. This helps us check in `function.cpp` to get the "promoted" op. As a fall back, the "promoted" op still resides in `register_prim_ops.cpp` so that the function of prim op is never missed.

ghstack-source-id: 138261338

Test Plan:
```
[pavithran@67109.od ~/fbsource/fbcode (eddab7da6)]$ buck test caffe2/test/cpp/jit:jit -- BackendTest.TestComposite
Building: finished in 5.4 sec (100%) 7284/7284 jobs, 0/7284 updated
  Total time: 5.8 sec
More details at https://www.internalfb.com/intern/buck/build/480191aa-a1ba-42ca-99e9-ee4bf2b06d65
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 867382eb-327f-43d7-a45c-875b7f484b15
Trace available for this run at /tmp/tpx-20210914-100224.283682/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (12.159)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.797)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestComposite (0.779)
Summary
  Pass: 2
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115
```

{F663491347}

Reviewed By: iseeyuan

Differential Revision: D30819926

fbshipit-source-id: 4cbe05d5761bdc9d62ef08e18172dcf64cb49526
2021-09-17 10:32:05 -07:00
3727baea6f [PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [2/2] (#64269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64269

Revert changes in D29826210 (693d8f2f07) (we don't need operator lambda caching since there aren't duplicate operators anymore)

This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided.
ghstack-source-id: 138014904

Test Plan:
**Speech Transducer v25 model (as in D29826210 (693d8f2f07))**

|| Before | After |
|Load Time|[229ms](https://www.internalfb.com/intern/aibench/details/160889436133243)|[200ms](https://www.internalfb.com/intern/aibench/details/837884532607514)|
|Save File Size|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658544950)|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658554403)|

The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before.

Steps
- Check out desired commit in devserver (base branch or this diff)
- ```buck build bento/kernels:bento_kernel_pytorch```
- Use N1094068 with pytorch_local kernel to save model for lite interpreter
- Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5
- ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ```

**Test that saving a model with de-dup ops doesn't change its output**
https://www.internalfb.com/intern/anp/view/?id=1137434

Reviewed By: iseeyuan

Differential Revision: D30615710

fbshipit-source-id: bb4052f0f16eccab386585e94411056f94bce43c
2021-09-14 12:12:46 -07:00
30a7c768d7 [RFC] Modularize functions of parsing bytecode (#61862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862

Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.

Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287

Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI

Reviewed By: larryliu0820

Differential Revision: D29798382

Pulled By: iseeyuan

fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f
2021-09-11 22:24:05 -07:00
f5e76b4e38 [PyTorch] Copy vectors less in Function::append_operator (#63977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63977

Doesn't seem to be any reason to copy these argument vectors.
ghstack-source-id: 137566815

Test Plan: CI

Reviewed By: dhruvbird, raziel

Differential Revision: D30550301

fbshipit-source-id: 33c199f975e4fb62c50a8210dc08aa9bb7a3e2f2
2021-09-08 18:31:38 -07:00
8d5b95019d [PyTorch Edge] Support default args with out arg, flag off (#63540)
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540

ghstack-source-id: 137211562

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```

Reviewed By: raziel, iseeyuan, tugsbayasgalan

Differential Revision: D30414156

fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f
2021-09-02 01:36:16 -07:00
ac99d63f83 [jit] Make operation call accept Stack& instead Stack* (#63414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414

Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318

Test Plan:
compiles.

Imported from OSS

Reviewed By: ejguan

Differential Revision: D30375410

fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
2021-08-30 11:49:20 -07:00
77a6436cac [Pytorch Mobile] Combing instructions and debug hanles in single struct (#62418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62418

Debug handles have one to one correspondence with instruction, so just
combine them in one.

Test Plan:
CI

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993661

fbshipit-source-id: 125c7163174cf66624dd95f110fdc8208fea8a07
2021-08-13 21:40:17 -07:00
693d8f2f07 [PyTorch Edge] Cache operator lambda during model loading [7% faster model loading] (#61996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61996

A recent post https://fb.workplace.com/groups/pytorch.edge.users/posts/2012215235600341/ about slow model loading with an accompanying perf report (report.html) caused me to look at the report and find hot spots during model loading. This suggested that we spend quite a bit of time looking up operators from the dispatcher. This means that we can probably just cach the operator handler functions (instead of computing them every time the operator name shows up since it potentially shows up multiple times in a given model).

This diff results in an approx 7% speedup in model loading time (from [315ms](https://www.internalfb.com/intern/aibench/details/45077128343028) to [293ms](https://www.internalfb.com/intern/aibench/details/600870874797229)) when run against an 87MB speech model that jiatongzhou provided.

See https://fb.workplace.com/groups/pytorch.dev/posts/855724575006024/ for the previous post from jiatongzhou.
ghstack-source-id: 134634612

Test Plan:
Run using AI Bench.

### Speech Transducer v25 model (87MiB)

Followed up with jiatongzhou and he gave me his speech model. For posterity, here's how to fetch it (you don't need to since I uploaded it to NMLML and now has a permanent Everstore Handle):

```
cd /tmp/
mkdir speech_model
cd speech_model
fbpkg fetch speech.stella.neural_transducer.on_device.en_us:25
cp pytorchmodel.pt ~/speech_transducer_v25_pytorchmodel.ptl
```

Here's how to build and run the benchmark using AI Bench:

```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote
```

Reviewed By: raziel

Differential Revision: D29826210

fbshipit-source-id: 134b67eb466e73f0e43447b9b966278f13c4b56f
2021-07-29 20:14:47 -07:00
d833caaf6b [PyTorch Mobile][Forward/backward compatibility] Number of arguments for operators (#56845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56845

Handle forward/backward compatibility caused by added default arguments in mobile. As an example,

In older version, operator aten::foo's schema is
```
foo(Tensor a, Tensor b) -> Tensor
```
In the new version, the schema is updated to
```
foo(Tensor a, Tensor b, int groups=1) -> Tensor
```

## Model file
Serialize the number of specified arguments to each operator into the bytecode operator table. Before the operator table contains operator name and overload name:
```
('operators', (('aten::foo', ''),))
```
Now the number of specified arguments is added:
```
# bytecode version 6
('operators', (('aten::foo', '', 2),))
```
where "2" means the number of specified arguments.

Since there's bytecode schema change, the bytecode version number is bumped. This PR is to be landed after #56002 , where the version number is bumped from 4 to 5. This PR bumps the version number from 5 to 6.

## Runtime and backward compatibility
When the operator is found (either jit or c10), we have the OperatorHandle, where the operator schema can be accessed by
```
op.value().schema().arguments()
```
Adaptation is implemented to handle backward compatibility. For the example above, the new runtime holds the updated schema:
```
foo(Tensor a, Tensor b, int groups=1) -> Tensor
```
Whereas the model file carries
```
(('aten::foo', ''), 2)
```
We can implement a wrapper around the original function pointer to push the default argument to the stack.

## Deliver time and forward compatibility
At model delivery time, two checks can be done:
### Operator check
Two APIs to be provided:
* Runtime: An API to get a runtime’s ops and their schemas (i.e. the # of args). D27920185(WIP)
* Model: An API to get a model’s ops and their schema requirements (i.e. the # of args required).

The APIs can be used to check
* runtime.ops() is a superset of model.ops()
* for each op in model.ops() validate their schemas are compatible with those in runtime.ops() -- i.e. the # args required in a model op are <= # args in the runtime op.

Note that only root ops in the model needs to be checked here. For transient ops it's not necessary. For example, if a root op, "aten::root" calls "aten::foo", it's "aten::root"'s responsibility to adapt to "aten::foo"'s change, or "aten::root" itself needs to be updated too.
### Bytecode version backport (PR coming)
When delivering a model with bytecode v6, if the runtime only works with bytecode v5 and lower, backport is needed.
* The number of arguments is removed from the operator table
* The bytecode version is changed from 6 to 5

Note that this backport is a pure format change, it does not guarantee the backported model always runs in old runtime. The operator check mentioned before should be done first, before it’s back ported to v5.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D27986544

Pulled By: iseeyuan

fbshipit-source-id: 143e19d4798cfb96b65095538dd648eead4e3fda
2021-05-13 14:20:47 -07:00
e0fc473e47 [Pytorch, Mobile] Serialize inlined callstack pointer with debug handle. (#55062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55062

This diff introduces the following changes:
1. InlinedCallStack pickler/serializer is introduced. It is serialized
as a tuple of {module_instance_info, source range tag, callee:InlinedCallStack}
Module instance info is serialized as tuple of {class_type_name,
instance_name}.
Note that callee of the serialized inlined callstack points to the tuple
of already serialized callstack. This means the first callstack ptr to
serialize, will serialize entire path of the tree, where some callee
nodes might be shared with callstack pointers that will be serialized
subsequently. Pickler supports memoization of pickled objects, where if
a tuple has been serialized then object id is obtained instead of
serialized object again. Thus we stll serialize the tree and not every
path from the root separately. Furthermore, InlinedCallStackSerializer
also uses cache to lookup the pointer and return the serialized IValue.
Furthermore, note that we must also serialize the source range of
InlinedCallStack. In order to this serializer requires map of
source-range-tags-to-source-range map. This was done in the previous
diff, where as part of source range serialization we also generate
unique tags. These are the tags that are serialized in InlinedCallStack.
Thus during deserialization we would have to deserialize source range
before deserializing InlinedCallStacks.
2. Furthermore, each serialized InlinedCallStack is serialized with a
unique debug_handle and source range tag.
BackendDebugHandleManager manages generation of
unique debug handles and saves the map of
debug-handles-to-{source_range_tag, inlined-callstack-ptr}.
This map is then serialized as callstack_debug_map.pkl. Note that
inlined callstack is not sufficient to get all the source information
since it contains source information about the nodes which are inlined.
The top-of-the-stack (or bottom) node, which is the actual op node, is
not part of the inlined callstack pointer and thus the source range of
this node is serialized separately using source_range_tag. This is
similar to how JIT creates callstack in
torch/csrc/jit/runtime/interpreter.cpp

Unique debug handles facilitates exception throwing or profiling using
just the debug handle without any further qualifications, such as which
function or module the inlined-callstack belongs to.

Furthermore, this diff refactors the old mobile code for tracking
module hierarchy information per op. Mainly now bytecode serialization
will serialize debug handles corresponding to ops/nodes in graph and
have callstack_debug_map.pkl help generate:
1. Entire callstack and
2. Module hierarchy information.

Test Plan:
python test/mobile/test_lite_script_module.py TestLiteScriptModule
./build/bin/test_jit --gtest_filter=*ModuleInfo

Imported from OSS

Reviewed By: raziel

Differential Revision: D27468709

fbshipit-source-id: 53e2413e7703ead01c77718b7c333c7c6ff50a23
2021-05-04 09:21:12 -07:00
f4a921600a [PyTorch, Mobile] Serialization format change for source range (#54284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54284

In order to bring mobile deployment, via lite interpreter, on feature
parity with JIT, with respect model level debug information we must make
model level debug information available to mobile runtime.
At the moment, model level debug information is stored in SourceRange
which associates node's of graph to where the come from in original
python source code.
This information is serialized as part of debug_pkl and deserialized
when JIT loads the model and reads the model code.
On lite interpreter, we do not have access to all the functionality of
JIT and hence we cannot load model in the same way as JIT, by reading
code, constructing module hierarchy and graph corresponding module
methods etc. Instead in, lite interpreter, only bytecode corresonding to
the compiled graph, Code, is saved.
Thus in order to annotate OPs in the bytecode with equivalent
SourceRange information we do the following:
1. During model serialization, we create a unique tag for each source
range of the model.
2. Create a map of <SourceRange, tag>
3. During debug_pkl serialization we save tag along with SourceRange, on
top of byte offset.
4. During bytecode generation, the methods of the top module are
lowered. During this process methods are inlined. In the inlined graph,
when the node of a graph is lowered to bytecode, we query node's source
range and look it up against the map.
5. Resulting source range tag is serialized in module_debug_info.
6. During model deserialization, we read all the debug_pkl records in
the archieve and create a map of <tag, SourceRange>
7. This map can be used to find source code information.

During mobile runtime:
1. We read all the debug_pkl records and create <tag=debug_handle,
SourceRange> map.
   1.1 This map, MobileDebugInfo, is a member of mobile Module.
2. Interpreter catches appropriate exceptions and sets the thread local
debug handle and rethrows the exception.
3. In Function's run method we catch exception and query current debug
handle where the exception happened.
4. Query MobileDebugInfo with debug handle to retrieve source range and
augment error with source range info.

This information is still incomplete as it does not contain entire
callstack.

In the following diffs we will serialize InlinedCallStack directly.

Note that compilation is gated by SYMBOLICATE_MOBILE_DEBUG_HANDLE macro,
so that mobile builds can avoid building MobileDebugInfo, source range
and source range pickler/unpickler. Later we will add path where, if
building without debug support stack trace will contain only debug
handles. They can be symbolicated later.

Test Plan:
Ported bunch of source range tests from test_jit.py. Added on more test
in test_lite_interpreter.py

Imported from OSS

Reviewed By: raziel

Differential Revision: D27174722

fbshipit-source-id: a7b7c6088ce16dec37e823c7fefa4f0b61047e12
2021-05-04 09:19:27 -07:00
23c50a4a50 [PyTorch Mobile] Support torchbind custom classes in lite interpreter (#51432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51432

ghstack-source-id: 120976584

torchbind is a convenient way to include custom class to both python and torchscript. CREATE_OBJECT is used to create an object of custom class.

CREATE_OBJECT was not supported by lite interpreter. The major reason was that for custom class directly defined in Python, there's no language parser in lite interpreter. It's still the case. However, for torchbind classes that are defined in C++, a python/torchscript parser is not needed.

This diff is to support the case of torchbind custom classes.
1. The class type can be resolved at import level.
2. If the class is not the supported torchbind class, an error message is provided at export stage. Workaround is also suggested.
3. Unit tests. C++: ```LiteInterpreterTest::BuiltinClass``` is added as an end-to-end test on supported class. Python: ```test_unsupported_createobject``` is changed to ```test_unsupported_classtype``` to test unsupported classes.

Test Plan: CI

Reviewed By: raziel

Differential Revision: D26168913

fbshipit-source-id: 74e8b6a12682ad8e9c39afdfd2b605c5f8e65427
2021-02-03 21:57:19 -08:00
87ad77eb4e T66557700 Support default argument values of a method (#48863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863

Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).

Test Plan:
buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation

buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg

Reviewed By: iseeyuan

Differential Revision: D25896212

fbshipit-source-id: 6d7e7fd5f3244a88bd44889024d81ad2e678ffa5
2021-02-01 18:35:13 -08:00
8530c65e25 [codemod][fbcode/caffe2] Apply clang-format update fixes
Test Plan: Sandcastle and visual inspection.

Reviewed By: igorsugak

Differential Revision: D25849205

fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0
2021-01-09 14:37:36 -08:00
4a870f6518 [PyTorch Mobile] Export Operator List from Mobile CompilationUnit instead of from TorchScript Model (#49385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49385

Currently, the API to export operator lists accepts a `torch::jit::Module` object, and spits out an operator list. The operator list is practically used only for mobile. This is not ideal because the set of root operators may change by the time the model is subsequently optmized and exported for mobile.

What we need to to instead is glean the list of operators from the mobile model itself (`bytecode.pkl` specifically), and expose that instead.

Also updated the logic in `converter`.

### Before this change:
1. Get operator List from Torch Script Model
2. Convert to bytecode mobile model

### After this change:
1. Convert to bytecode mobile model
2. Use this converted mobile model to get the list of operators for each method on the model

ghstack-source-id: 118796752

Test Plan:
Added a unit test in `test_lite_interpreter.cpp` to ensure that all model referenced operators show up in the exported operator list. Also make `test_lite_interpreter.cpp` runnable from `xplat/caffe2/BUCK` since this is where the production code will be built from.

Verified that the list of operators produced before and after this change for an example model (segmentation) are the same.

{P147863234}

Also verified that the operator lists for BI-Xray model is different (we have been having problems with missing operators for this one): {P154903132}

Reviewed By: iseeyuan

Differential Revision: D24690094

fbshipit-source-id: 0426a6ef90456a811010cfe337c415882ae2deff
2020-12-18 11:17:57 -08:00
2b61e4d84c Revert D25152559: T66557700 Support default argument values of a method
Test Plan: revert-hammer

Differential Revision:
D25152559 (6bde0ca6d3)

Original commit changeset: bbf52f1fbdbf

fbshipit-source-id: 592fdb3078b1ac86cd394adc6c1bfd6b10d829e1
2020-12-17 14:05:49 -08:00
6bde0ca6d3 T66557700 Support default argument values of a method (#48863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863

Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).

Test Plan:
buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation

buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg

Reviewed By: raziel, iseeyuan

Differential Revision: D25152559

fbshipit-source-id: bbf52f1fbdbfbc6f8fa8b65ab524b1cd4648f9c0
2020-12-16 15:55:03 -08:00
9b3c72d46e [pytorch] Make mobile find_method return an optional (#43965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43965

As part of a larger effort to unify the API between the lite interpreter and full JIT:
- implement torch::jit::mobile::Method, a proxy for torch::jit::mobile::Function
- add support for overloaded operator() to mobile Method and Function
- mobile find_method now returns a c10::optional<Method> (so signature matches full jit)
- moves some implementation of Function from module.cpp to function.cpp
ghstack-source-id: 111161942

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D23330762

fbshipit-source-id: bf0ba0d711d9566c92af31772057ecd35983ee6d
2020-09-03 14:46:18 -07:00
93f1b5c8da Mobile backward compatibility (#42413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42413

When a default argument is added, it does not break backward compatibility (BC) for full-jit, but does break BC for mobile bytecode. For example, https://github.com/pytorch/pytorch/pull/40737. To make bytecode BC in this case, we

1. Introduce kMinSupportedBytecodeVersion. The loaded model version should be between kMinSupportedBytecodeVersion and kProducedBytecodeVersion.
2. If an operator is updated, and we can handle BC, bump the kProducedBytecodeVersion (for example, from 3 to 4).
3. If model version is at the older version of the operator, add an adapter function at loading. For the added default arg, we push this default arg to stack before calling the actual operator function.

Test Plan: Imported from OSS

Reviewed By: xcheng16

Differential Revision: D22898314

Pulled By: iseeyuan

fbshipit-source-id: 90d339f8e1365f4bb178db8db7c147390173372b
2020-08-21 15:45:52 -07:00