Commit Graph

46 Commits

Author SHA1 Message Date
12daa4f663 [jit][edge] Enable CALL instruction in lite interpreter. (#65964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65964

ghstack-source-id: 141425519

Test Plan: buck run xplat/caffe2:test_lite_interpreter

Reviewed By: cccclai

Differential Revision: D31326149

fbshipit-source-id: 8a599d92f3fa4e6c125100adb36d89592e71e547
2021-10-25 14:44:33 -07:00
4dce051cb0 [jit][edge] Add control stack frame to lite interpreter (#65963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65963

ghstack-source-id: 141425517

Test Plan: In next diff.

Reviewed By: qihqi, cccclai

Differential Revision: D31326150

fbshipit-source-id: dbbf65f2bf14846c45d0add71edc7d4dbfc6b92c
2021-10-25 12:15:16 -07:00
64caee1356 [PyTorch Edge] Leave out field for debug_handle if not being built with eager symbolication support (#66131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66131

Turns out that a model with 72k instructions causes about 0.5MiB of additional memory overhead (if there's an 8 byte memory overhead per instruction). This is not necessary if we're building w/o eager symbolication support. This change eliminates the 8 byte `debug_handle` if the build is w/o eager symbolication support.
ghstack-source-id: 140045478

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck build -c "pt.enable_eager_symbolication"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor
buck build //xplat/caffe2/fb/lite_predictor:lite_predictor
```

Reviewed By: kimishpatel

Differential Revision: D31387784

fbshipit-source-id: af56787ad833b990a46b79ab021e512edaa22143
2021-10-07 20:01:18 -07:00
fc4836f400 [Fix] Use full name to look for the promoted prim operator table (#66081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66081

Two fixes:

1. Since the operators are always registered with both name and overload name, the overloaded name need to be included when looking for an operator.
2. Don't promote operators with alias, because the new registry does not support schema with alias.

ghstack-source-id: 139732099

Test Plan: CI

Reviewed By: pavithranrao

Differential Revision: D31382262

fbshipit-source-id: 43c6e6e0c13950a9ce8cf3a70debe0421372d053
2021-10-06 15:35:02 -07:00
3c003aa6ae [PyTorchEdge] promote prim ops by using ops table for mobile runtime (#64816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64816

## Context:
Promoting prim ops:
Certain prim ops are frequent than others (like tupleIndex, raiseException, ...). These ops are frequent that they are chosen to be promoted as first class instructions. To promote it requires multiple steps and support from TS team as it changes how the bytecode is serialized and deserialized. So to prevent multiple bytecode version bumps and provided stability while these changes happen, an iterim iterative process is proposed which uses a table to lookup for "promoted" op's function. This allows us to rapidly update the ops list and test on production model without having to change the bytecode. In case of failure, we can quickly revert this change.

## Observation
The ops are chosen based on the notebook N1135657 which examines the top frequent ops.

## Fix
An iterim solution of having a static table, which when given a prim op name returns a function to be applied on the stack. This helps us check in `function.cpp` to get the "promoted" op. As a fall back, the "promoted" op still resides in `register_prim_ops.cpp` so that the function of prim op is never missed.

ghstack-source-id: 138261338

Test Plan:
```
[pavithran@67109.od ~/fbsource/fbcode (eddab7da6)]$ buck test caffe2/test/cpp/jit:jit -- BackendTest.TestComposite
Building: finished in 5.4 sec (100%) 7284/7284 jobs, 0/7284 updated
  Total time: 5.8 sec
More details at https://www.internalfb.com/intern/buck/build/480191aa-a1ba-42ca-99e9-ee4bf2b06d65
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 867382eb-327f-43d7-a45c-875b7f484b15
Trace available for this run at /tmp/tpx-20210914-100224.283682/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (12.159)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.797)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestComposite (0.779)
Summary
  Pass: 2
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115
```

{F663491347}

Reviewed By: iseeyuan

Differential Revision: D30819926

fbshipit-source-id: 4cbe05d5761bdc9d62ef08e18172dcf64cb49526
2021-09-17 10:32:05 -07:00
3727baea6f [PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [2/2] (#64269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64269

Revert changes in D29826210 (693d8f2f07) (we don't need operator lambda caching since there aren't duplicate operators anymore)

This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided.
ghstack-source-id: 138014904

Test Plan:
**Speech Transducer v25 model (as in D29826210 (693d8f2f07))**

|| Before | After |
|Load Time|[229ms](https://www.internalfb.com/intern/aibench/details/160889436133243)|[200ms](https://www.internalfb.com/intern/aibench/details/837884532607514)|
|Save File Size|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658544950)|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658554403)|

The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before.

Steps
- Check out desired commit in devserver (base branch or this diff)
- ```buck build bento/kernels:bento_kernel_pytorch```
- Use N1094068 with pytorch_local kernel to save model for lite interpreter
- Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5
- ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ```

**Test that saving a model with de-dup ops doesn't change its output**
https://www.internalfb.com/intern/anp/view/?id=1137434

Reviewed By: iseeyuan

Differential Revision: D30615710

fbshipit-source-id: bb4052f0f16eccab386585e94411056f94bce43c
2021-09-14 12:12:46 -07:00
30a7c768d7 [RFC] Modularize functions of parsing bytecode (#61862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862

Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.

Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287

Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI

Reviewed By: larryliu0820

Differential Revision: D29798382

Pulled By: iseeyuan

fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f
2021-09-11 22:24:05 -07:00
f5e76b4e38 [PyTorch] Copy vectors less in Function::append_operator (#63977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63977

Doesn't seem to be any reason to copy these argument vectors.
ghstack-source-id: 137566815

Test Plan: CI

Reviewed By: dhruvbird, raziel

Differential Revision: D30550301

fbshipit-source-id: 33c199f975e4fb62c50a8210dc08aa9bb7a3e2f2
2021-09-08 18:31:38 -07:00
8d5b95019d [PyTorch Edge] Support default args with out arg, flag off (#63540)
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540

ghstack-source-id: 137211562

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```

Reviewed By: raziel, iseeyuan, tugsbayasgalan

Differential Revision: D30414156

fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f
2021-09-02 01:36:16 -07:00
ac99d63f83 [jit] Make operation call accept Stack& instead Stack* (#63414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414

Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318

Test Plan:
compiles.

Imported from OSS

Reviewed By: ejguan

Differential Revision: D30375410

fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
2021-08-30 11:49:20 -07:00
77a6436cac [Pytorch Mobile] Combing instructions and debug hanles in single struct (#62418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62418

Debug handles have one to one correspondence with instruction, so just
combine them in one.

Test Plan:
CI

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993661

fbshipit-source-id: 125c7163174cf66624dd95f110fdc8208fea8a07
2021-08-13 21:40:17 -07:00
693d8f2f07 [PyTorch Edge] Cache operator lambda during model loading [7% faster model loading] (#61996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61996

A recent post https://fb.workplace.com/groups/pytorch.edge.users/posts/2012215235600341/ about slow model loading with an accompanying perf report (report.html) caused me to look at the report and find hot spots during model loading. This suggested that we spend quite a bit of time looking up operators from the dispatcher. This means that we can probably just cach the operator handler functions (instead of computing them every time the operator name shows up since it potentially shows up multiple times in a given model).

This diff results in an approx 7% speedup in model loading time (from [315ms](https://www.internalfb.com/intern/aibench/details/45077128343028) to [293ms](https://www.internalfb.com/intern/aibench/details/600870874797229)) when run against an 87MB speech model that jiatongzhou provided.

See https://fb.workplace.com/groups/pytorch.dev/posts/855724575006024/ for the previous post from jiatongzhou.
ghstack-source-id: 134634612

Test Plan:
Run using AI Bench.

### Speech Transducer v25 model (87MiB)

Followed up with jiatongzhou and he gave me his speech model. For posterity, here's how to fetch it (you don't need to since I uploaded it to NMLML and now has a permanent Everstore Handle):

```
cd /tmp/
mkdir speech_model
cd speech_model
fbpkg fetch speech.stella.neural_transducer.on_device.en_us:25
cp pytorchmodel.pt ~/speech_transducer_v25_pytorchmodel.ptl
```

Here's how to build and run the benchmark using AI Bench:

```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote
```

Reviewed By: raziel

Differential Revision: D29826210

fbshipit-source-id: 134b67eb466e73f0e43447b9b966278f13c4b56f
2021-07-29 20:14:47 -07:00
d833caaf6b [PyTorch Mobile][Forward/backward compatibility] Number of arguments for operators (#56845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56845

Handle forward/backward compatibility caused by added default arguments in mobile. As an example,

In older version, operator aten::foo's schema is
```
foo(Tensor a, Tensor b) -> Tensor
```
In the new version, the schema is updated to
```
foo(Tensor a, Tensor b, int groups=1) -> Tensor
```

## Model file
Serialize the number of specified arguments to each operator into the bytecode operator table. Before the operator table contains operator name and overload name:
```
('operators', (('aten::foo', ''),))
```
Now the number of specified arguments is added:
```
# bytecode version 6
('operators', (('aten::foo', '', 2),))
```
where "2" means the number of specified arguments.

Since there's bytecode schema change, the bytecode version number is bumped. This PR is to be landed after #56002 , where the version number is bumped from 4 to 5. This PR bumps the version number from 5 to 6.

## Runtime and backward compatibility
When the operator is found (either jit or c10), we have the OperatorHandle, where the operator schema can be accessed by
```
op.value().schema().arguments()
```
Adaptation is implemented to handle backward compatibility. For the example above, the new runtime holds the updated schema:
```
foo(Tensor a, Tensor b, int groups=1) -> Tensor
```
Whereas the model file carries
```
(('aten::foo', ''), 2)
```
We can implement a wrapper around the original function pointer to push the default argument to the stack.

## Deliver time and forward compatibility
At model delivery time, two checks can be done:
### Operator check
Two APIs to be provided:
* Runtime: An API to get a runtime’s ops and their schemas (i.e. the # of args). D27920185(WIP)
* Model: An API to get a model’s ops and their schema requirements (i.e. the # of args required).

The APIs can be used to check
* runtime.ops() is a superset of model.ops()
* for each op in model.ops() validate their schemas are compatible with those in runtime.ops() -- i.e. the # args required in a model op are <= # args in the runtime op.

Note that only root ops in the model needs to be checked here. For transient ops it's not necessary. For example, if a root op, "aten::root" calls "aten::foo", it's "aten::root"'s responsibility to adapt to "aten::foo"'s change, or "aten::root" itself needs to be updated too.
### Bytecode version backport (PR coming)
When delivering a model with bytecode v6, if the runtime only works with bytecode v5 and lower, backport is needed.
* The number of arguments is removed from the operator table
* The bytecode version is changed from 6 to 5

Note that this backport is a pure format change, it does not guarantee the backported model always runs in old runtime. The operator check mentioned before should be done first, before it’s back ported to v5.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D27986544

Pulled By: iseeyuan

fbshipit-source-id: 143e19d4798cfb96b65095538dd648eead4e3fda
2021-05-13 14:20:47 -07:00
e0fc473e47 [Pytorch, Mobile] Serialize inlined callstack pointer with debug handle. (#55062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55062

This diff introduces the following changes:
1. InlinedCallStack pickler/serializer is introduced. It is serialized
as a tuple of {module_instance_info, source range tag, callee:InlinedCallStack}
Module instance info is serialized as tuple of {class_type_name,
instance_name}.
Note that callee of the serialized inlined callstack points to the tuple
of already serialized callstack. This means the first callstack ptr to
serialize, will serialize entire path of the tree, where some callee
nodes might be shared with callstack pointers that will be serialized
subsequently. Pickler supports memoization of pickled objects, where if
a tuple has been serialized then object id is obtained instead of
serialized object again. Thus we stll serialize the tree and not every
path from the root separately. Furthermore, InlinedCallStackSerializer
also uses cache to lookup the pointer and return the serialized IValue.
Furthermore, note that we must also serialize the source range of
InlinedCallStack. In order to this serializer requires map of
source-range-tags-to-source-range map. This was done in the previous
diff, where as part of source range serialization we also generate
unique tags. These are the tags that are serialized in InlinedCallStack.
Thus during deserialization we would have to deserialize source range
before deserializing InlinedCallStacks.
2. Furthermore, each serialized InlinedCallStack is serialized with a
unique debug_handle and source range tag.
BackendDebugHandleManager manages generation of
unique debug handles and saves the map of
debug-handles-to-{source_range_tag, inlined-callstack-ptr}.
This map is then serialized as callstack_debug_map.pkl. Note that
inlined callstack is not sufficient to get all the source information
since it contains source information about the nodes which are inlined.
The top-of-the-stack (or bottom) node, which is the actual op node, is
not part of the inlined callstack pointer and thus the source range of
this node is serialized separately using source_range_tag. This is
similar to how JIT creates callstack in
torch/csrc/jit/runtime/interpreter.cpp

Unique debug handles facilitates exception throwing or profiling using
just the debug handle without any further qualifications, such as which
function or module the inlined-callstack belongs to.

Furthermore, this diff refactors the old mobile code for tracking
module hierarchy information per op. Mainly now bytecode serialization
will serialize debug handles corresponding to ops/nodes in graph and
have callstack_debug_map.pkl help generate:
1. Entire callstack and
2. Module hierarchy information.

Test Plan:
python test/mobile/test_lite_script_module.py TestLiteScriptModule
./build/bin/test_jit --gtest_filter=*ModuleInfo

Imported from OSS

Reviewed By: raziel

Differential Revision: D27468709

fbshipit-source-id: 53e2413e7703ead01c77718b7c333c7c6ff50a23
2021-05-04 09:21:12 -07:00
f4a921600a [PyTorch, Mobile] Serialization format change for source range (#54284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54284

In order to bring mobile deployment, via lite interpreter, on feature
parity with JIT, with respect model level debug information we must make
model level debug information available to mobile runtime.
At the moment, model level debug information is stored in SourceRange
which associates node's of graph to where the come from in original
python source code.
This information is serialized as part of debug_pkl and deserialized
when JIT loads the model and reads the model code.
On lite interpreter, we do not have access to all the functionality of
JIT and hence we cannot load model in the same way as JIT, by reading
code, constructing module hierarchy and graph corresponding module
methods etc. Instead in, lite interpreter, only bytecode corresonding to
the compiled graph, Code, is saved.
Thus in order to annotate OPs in the bytecode with equivalent
SourceRange information we do the following:
1. During model serialization, we create a unique tag for each source
range of the model.
2. Create a map of <SourceRange, tag>
3. During debug_pkl serialization we save tag along with SourceRange, on
top of byte offset.
4. During bytecode generation, the methods of the top module are
lowered. During this process methods are inlined. In the inlined graph,
when the node of a graph is lowered to bytecode, we query node's source
range and look it up against the map.
5. Resulting source range tag is serialized in module_debug_info.
6. During model deserialization, we read all the debug_pkl records in
the archieve and create a map of <tag, SourceRange>
7. This map can be used to find source code information.

During mobile runtime:
1. We read all the debug_pkl records and create <tag=debug_handle,
SourceRange> map.
   1.1 This map, MobileDebugInfo, is a member of mobile Module.
2. Interpreter catches appropriate exceptions and sets the thread local
debug handle and rethrows the exception.
3. In Function's run method we catch exception and query current debug
handle where the exception happened.
4. Query MobileDebugInfo with debug handle to retrieve source range and
augment error with source range info.

This information is still incomplete as it does not contain entire
callstack.

In the following diffs we will serialize InlinedCallStack directly.

Note that compilation is gated by SYMBOLICATE_MOBILE_DEBUG_HANDLE macro,
so that mobile builds can avoid building MobileDebugInfo, source range
and source range pickler/unpickler. Later we will add path where, if
building without debug support stack trace will contain only debug
handles. They can be symbolicated later.

Test Plan:
Ported bunch of source range tests from test_jit.py. Added on more test
in test_lite_interpreter.py

Imported from OSS

Reviewed By: raziel

Differential Revision: D27174722

fbshipit-source-id: a7b7c6088ce16dec37e823c7fefa4f0b61047e12
2021-05-04 09:19:27 -07:00
23c50a4a50 [PyTorch Mobile] Support torchbind custom classes in lite interpreter (#51432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51432

ghstack-source-id: 120976584

torchbind is a convenient way to include custom class to both python and torchscript. CREATE_OBJECT is used to create an object of custom class.

CREATE_OBJECT was not supported by lite interpreter. The major reason was that for custom class directly defined in Python, there's no language parser in lite interpreter. It's still the case. However, for torchbind classes that are defined in C++, a python/torchscript parser is not needed.

This diff is to support the case of torchbind custom classes.
1. The class type can be resolved at import level.
2. If the class is not the supported torchbind class, an error message is provided at export stage. Workaround is also suggested.
3. Unit tests. C++: ```LiteInterpreterTest::BuiltinClass``` is added as an end-to-end test on supported class. Python: ```test_unsupported_createobject``` is changed to ```test_unsupported_classtype``` to test unsupported classes.

Test Plan: CI

Reviewed By: raziel

Differential Revision: D26168913

fbshipit-source-id: 74e8b6a12682ad8e9c39afdfd2b605c5f8e65427
2021-02-03 21:57:19 -08:00
87ad77eb4e T66557700 Support default argument values of a method (#48863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863

Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).

Test Plan:
buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation

buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg

Reviewed By: iseeyuan

Differential Revision: D25896212

fbshipit-source-id: 6d7e7fd5f3244a88bd44889024d81ad2e678ffa5
2021-02-01 18:35:13 -08:00
8530c65e25 [codemod][fbcode/caffe2] Apply clang-format update fixes
Test Plan: Sandcastle and visual inspection.

Reviewed By: igorsugak

Differential Revision: D25849205

fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0
2021-01-09 14:37:36 -08:00
4a870f6518 [PyTorch Mobile] Export Operator List from Mobile CompilationUnit instead of from TorchScript Model (#49385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49385

Currently, the API to export operator lists accepts a `torch::jit::Module` object, and spits out an operator list. The operator list is practically used only for mobile. This is not ideal because the set of root operators may change by the time the model is subsequently optmized and exported for mobile.

What we need to to instead is glean the list of operators from the mobile model itself (`bytecode.pkl` specifically), and expose that instead.

Also updated the logic in `converter`.

### Before this change:
1. Get operator List from Torch Script Model
2. Convert to bytecode mobile model

### After this change:
1. Convert to bytecode mobile model
2. Use this converted mobile model to get the list of operators for each method on the model

ghstack-source-id: 118796752

Test Plan:
Added a unit test in `test_lite_interpreter.cpp` to ensure that all model referenced operators show up in the exported operator list. Also make `test_lite_interpreter.cpp` runnable from `xplat/caffe2/BUCK` since this is where the production code will be built from.

Verified that the list of operators produced before and after this change for an example model (segmentation) are the same.

{P147863234}

Also verified that the operator lists for BI-Xray model is different (we have been having problems with missing operators for this one): {P154903132}

Reviewed By: iseeyuan

Differential Revision: D24690094

fbshipit-source-id: 0426a6ef90456a811010cfe337c415882ae2deff
2020-12-18 11:17:57 -08:00
2b61e4d84c Revert D25152559: T66557700 Support default argument values of a method
Test Plan: revert-hammer

Differential Revision:
D25152559 (6bde0ca6d3)

Original commit changeset: bbf52f1fbdbf

fbshipit-source-id: 592fdb3078b1ac86cd394adc6c1bfd6b10d829e1
2020-12-17 14:05:49 -08:00
6bde0ca6d3 T66557700 Support default argument values of a method (#48863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863

Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).

Test Plan:
buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation

buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg

Reviewed By: raziel, iseeyuan

Differential Revision: D25152559

fbshipit-source-id: bbf52f1fbdbfbc6f8fa8b65ab524b1cd4648f9c0
2020-12-16 15:55:03 -08:00
9b3c72d46e [pytorch] Make mobile find_method return an optional (#43965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43965

As part of a larger effort to unify the API between the lite interpreter and full JIT:
- implement torch::jit::mobile::Method, a proxy for torch::jit::mobile::Function
- add support for overloaded operator() to mobile Method and Function
- mobile find_method now returns a c10::optional<Method> (so signature matches full jit)
- moves some implementation of Function from module.cpp to function.cpp
ghstack-source-id: 111161942

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D23330762

fbshipit-source-id: bf0ba0d711d9566c92af31772057ecd35983ee6d
2020-09-03 14:46:18 -07:00
93f1b5c8da Mobile backward compatibility (#42413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42413

When a default argument is added, it does not break backward compatibility (BC) for full-jit, but does break BC for mobile bytecode. For example, https://github.com/pytorch/pytorch/pull/40737. To make bytecode BC in this case, we

1. Introduce kMinSupportedBytecodeVersion. The loaded model version should be between kMinSupportedBytecodeVersion and kProducedBytecodeVersion.
2. If an operator is updated, and we can handle BC, bump the kProducedBytecodeVersion (for example, from 3 to 4).
3. If model version is at the older version of the operator, add an adapter function at loading. For the added default arg, we push this default arg to stack before calling the actual operator function.

Test Plan: Imported from OSS

Reviewed By: xcheng16

Differential Revision: D22898314

Pulled By: iseeyuan

fbshipit-source-id: 90d339f8e1365f4bb178db8db7c147390173372b
2020-08-21 15:45:52 -07:00
ccd9f3244b Get, save, and load module information for each operator (#42133)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42133

Test Plan:
We save a module with module debugging information as follows.
```
import torch
m = torch.jit.load('./detect.pt')
# Save module without debug info
m._save_for_lite_interpreter('./detect.bc')
# Save module with debug info
m._save_for_lite_interpreter('./detect.bc', _save_debug_info_in_bytecode=True)
```
Size of the file without module debugging information: 4.508 MB
Size of the file with module debugging information: 4.512 MB

Reviewed By: kimishpatel

Differential Revision: D22803740

Pulled By: taivu1998

fbshipit-source-id: c82ea62498fde36a1cfc5b073e2cea510d3b7edb
2020-08-14 01:25:27 -07:00
33e26656fa list workaround for CREATE_OBJECT failure (#41129)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41129

Test Plan: Imported from OSS

Differential Revision: D22436064

Pulled By: ann-ss

fbshipit-source-id: 7cfc38eb953410edfe3d21346c6e377c3b3bfc1f
2020-07-08 18:36:04 -07:00
53af9df557 Unify boxed function signature between jit and c10 (#37034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37034

c10 takes a Stack* in boxed functions while JIT took Stack&.
c10 doesn't return anything while JIT returns an int which is always zero.

This changes JIT to follow the c10 behavior.
ghstack-source-id: 106834069

Test Plan: unit tests

Differential Revision: D20567950

fbshipit-source-id: 1a7aea291023afc52ae706957e9a5ca576fbb53b
2020-06-29 19:24:26 -07:00
4e976b9334 Remove callBoxedWorkaround (#36850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36850

Since now all unboxing happens after dispatch, which means that all c10 ops support unboxing, we can now use op.callBoxed() for all ops and don't need callBoxedWorkaround (which was going through the JIT registry) anymore.
ghstack-source-id: 102879558

Test Plan: waitforsandcastle

Differential Revision: D21102375

fbshipit-source-id: d1e041116563a9650d5a86b07eb96d217d8756f3
2020-04-24 23:13:31 -07:00
3880f14b64 Canonicalize includes in torch, and add tests for it (#36303)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36303

Test Plan: Imported from OSS

Differential Revision: D20943003

Pulled By: ezyang

fbshipit-source-id: 81fcbaccc1a7eec422bd8347d196bb66a5467884
2020-04-23 08:09:21 -07:00
a894fff265 Back out "Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API"
Summary: Original commit changeset: 636e8a11afc6

Test Plan: export to OSS

Reviewed By: malfet

Differential Revision: D21170502

fbshipit-source-id: e8f35f103c4924aedbcaaf868475008d24bdeeab
2020-04-22 09:18:23 -07:00
2ccdc39dce Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API
Test Plan: revert-hammer

Differential Revision:
D21089648

Original commit changeset: 8d54329c1252

fbshipit-source-id: 636e8a11afc628a4cdae9d44824985c10c70555e
2020-04-21 12:21:45 -07:00
01100cb477 Put TORCH_LIBRARY in torch/library.h; add custom class API (#36742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742

Now, you can define a custom class inside a TORCH_LIBRARY block.
It looks very similar to what you did before.  Instead of

```
static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo);
```

you write

```
TORCH_LIBRARY(Namespace, m) {
  m.class_<Class>("Class")
    .def("foo", foo);
}
```

All the old usages still work, but at some point we should start
updating the tutorials when we're ready to go 100% live with the
new pybind11 style API.

custom class API previously lived in torch/ folder and in torch
namespace, so for consistency, the new TORCH_LIBRARY also got
moved to torch/library.h The definition of Library::class_ is in the
bottom of that header because I need all of the class_ constructors
available, but there is a circular dependency between the two headers.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D21089648

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507
2020-04-21 10:05:21 -07:00
a91097bdfb Revert D20964368: Revert D20408831: [Lite Interpreter] Operator registration migrate from manual to selective build
Test Plan: revert-hammer

Differential Revision:
D20964368

Original commit changeset: f1874088a597

fbshipit-source-id: d9317ed97a98e2b04c190785b5564536b1096282
2020-04-10 08:19:36 -07:00
586481a6e2 Revert D20408831: [Lite Interpreter] Operator registration migrate from manual to selective build
Test Plan: revert-hammer

Differential Revision:
D20408831

Original commit changeset: ec75dd762c46

fbshipit-source-id: f1874088a5970dd220cc027d0020ab6223b9bd93
2020-04-10 08:03:38 -07:00
7fcf8b0a3b [Lite Interpreter] Operator registration migrate from manual to selective build (#35426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35426

Use selective build with the full set of operators (vs. manually register each used op with "_" prefix).

Lite interpreter relies on JIT operator dispatch. In future we still need JIT operator dispatch dispatch ops that are not registered in c10.
Currently the selective build is for c10/aten dispatch in BUCK. There is JIT selective code-gen in OSS but not ported to BUCK yet.
This diff is also porting the selective code-gen in BUCK.
* The selected op list is passed to gen_jit_dispatch.py.
* The list passed to gen_jit_dispatch is the top-level ops (USED_PT_OPS) only, because the selective c10/aten dispatch already registered other ops that are called from the top-level ops.

ghstack-source-id: 101885215

(Note: this ignores all push blocking failures!)

Test Plan:
1. In Python, run torch.jit.export_opnames(scripted_M_mod)
2. Append the operator names into fbcode/caffe2/pt_ops.bzl and the BUCK target.
3. Run
```
buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/temp/bi_pytext_0315.bc --input_dims "1,4" --input_type int64 --pytext_len=4
```
Should provide expected results.
In addition, the size of the generated code for JIT registration, for example, ```register_aten_ops_0.cpp```, should be significantly reduced (from ~250 KB to ~80KB). The non-selected op registration schema are still kept, but the registration functor is replaced by ```DUMMY_OPERATION```

Reviewed By: ljk53

Differential Revision: D20408831

fbshipit-source-id: ec75dd762c4613aeda3b2094f5dad11804dc9492
2020-04-10 02:31:32 -07:00
361eed6a6e Use JIT op registration directly for lite interpreter. (#34070)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34070

The first step to make all operators available for lite interpreter. The original code used manual registration for lite interpreter ops with a "_" prefix, for two reasons:
1. To minimize the build size.
2. To avoid duplicate registration in OSS (majorly feature testing and unit tests).

Now since we have more and more models to support, the manual registration way is not practical. To make this process automatic while keeping the binary size under control, we plan to:
1. Make all necessary ops callable from lite interpreter.
2. The binary size would be increased because of step 1. Use ljk53 's custom build to selectively build the binary with ops used in specific models. The ops will be automatically collected using get_opnames.
3. The temporary "register_mobile_ops.cpp" can be removed.

Test Plan: Imported from OSS

Differential Revision: D20291596

Pulled By: iseeyuan

fbshipit-source-id: 553b4699619cd71fea20658f3bc8c2d48852ef5c
2020-03-25 07:21:51 -07:00
ab76a8206f [JIT][mobile] Support built-in Function call in lite interpreter (#34676)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34676

Test Plan: Imported from OSS

Differential Revision: D20427938

Pulled By: jamesr66a

fbshipit-source-id: 79eebfa858776f26da55ffd49d3f78fa7ae0df9b
2020-03-13 18:24:18 -07:00
02478984d6 Add support to dump unsupported ops. Add lite_interpter_load test. (#34278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34278

This diff helps check all the ops not supported by lite_interpreter.
Helpful mainly to find all the ops that need to be added instead of adding them
one by one.

Test Plan:
buck run caffe2/binaries:lite_interpreter_model_load --
--model=<bytecode-model-path>

Reviewed By: iseeyuan

Differential Revision: D20266341

fbshipit-source-id: 5a6c7a5bc52f910cea82a72045870da8105ccb87
2020-03-05 18:31:31 -08:00
d59e036f4d Revert D20194092: Add support to dump unsupported ops. Add lite_interpter_load test.
Test Plan: revert-hammer

Differential Revision:
D20194092

Original commit changeset: 0d596cd02043

fbshipit-source-id: 17b4bae27543f231bd6c12d90368d399ca55ebdf
2020-03-04 13:53:58 -08:00
17a5c67796 Add support to dump unsupported ops. Add lite_interpter_load test. (#34072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34072

This diff helps check all the ops not supported by lite_interpreter.
Helpful mainly to find all the ops that need to be added instead of adding them
one by one.

Test Plan:
buck run caffe2/binaries:lite_interpreter_model_load --
--model=<bytecode-model-path>

Reviewed By: iseeyuan

Differential Revision: D20194092

fbshipit-source-id: 0d596cd0204308027194af7ed738551d0c32a374
2020-03-04 13:18:12 -08:00
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00
f1b73799d5 Clean up isinstance flags (#33265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33265

This removes the need for isinstance to keep trace of list and tuple
separately by introducing AnyListType and AnyTupleType into the JIT
type system to be the common supertype of any lists or tuples.

This allows us to remove the weird flags from the interpreter for
the isinstance operator.

Test Plan: Imported from OSS

Differential Revision: D19883933

Pulled By: zdevito

fbshipit-source-id: f998041b42d8b4554c5b99f4d95d1d42553c4d81
2020-02-18 15:07:06 -08:00
7f2c25b6fa Move special ops into interpreter (#32889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32889

Common primitive ops that have special inputs make it very hard to
serialize the bytecode for mobile because information about how the
op behaves is hidden in the Node*. This changes how we handle the following
ops so that they are encoded as their own interpreter bytecodes.

```
    USES NODE: prim::TupleUnpack(...) -> (...)
    USES NODE: prim::TupleSlice(...) -> (...)
    USES NODE: prim::TupleConstruct(...) -> (...)
    USES NODE: prim::ListUnpack(...) -> (...)
    USES NODE: prim::ListConstruct(...) -> (...)
    USES NODE: prim::DictConstruct(...) -> (...)
    USES NODE: prim::Constant() -> (...)
    USES NODE: prim::isinstance(...) -> (...)
    USES NODE: prim::CreateObject(...) -> (...)
    USES NODE: prim::fork(...) -> (...)
    USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack
```

This leaves a state where the _only_ remaining Node*-consuming builtins
are things that are only introduced during JIT optimization and will
not appear in mobile code.

Serialization of bytecode can now be made to directly write the CodeImpl
object without modification.

Test Plan: Imported from OSS

Differential Revision: D19673157

Pulled By: zdevito

fbshipit-source-id: 7b8c633d38a4c783b250fbdb222705e71a83ad26
2020-02-18 15:07:01 -08:00
f362cd510d Move prim ops from JIT registration to C10 (#30612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30612

The first version to move prim ops to c10 registration. After the reviewers are fine with the initial changes, more operators will be moved in the same style.

Test Plan: Imported from OSS

Differential Revision: D19237648

Pulled By: iseeyuan

fbshipit-source-id: c5a519604efffb80564a556536f17d829f71d9f9
2020-01-04 13:47:44 -08:00
3003c5f91b OPN ops TupleConstruct/Unpack and format. (#29635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29635

TupleConstruct/Unpack as OPN ops.

Test Plan: Imported from OSS

Differential Revision: D18499602

fbshipit-source-id: 389b21d3ea532ef6fa729d67ce34214d86700cd2
2019-11-15 16:22:42 -08:00
19ab5381c3 Add OPN instruction and vararg operator table (#27104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27104

* The use case here is to replace prim::ListConstruct, which requires Node, but Node is not available in mobile lite interpreter.
* (OPN, X, N), X is the index to the vararg operator-name and operator tables. N is number of inputs. For ListConstruct example, operator name can be "aten::listconstruct" and the overloaded name is the output type ("int", "float", "bool", "tensor" and "generic").
* A vararg operator table is built with void(int input_size, Stack& stack) functions.
## Unit test
LiteInterpreterConv covers OPN instruction and conv operator.

Test Plan: Imported from OSS

Differential Revision: D17762853

fbshipit-source-id: 475aa0c6678e3760cec805862a78510913a89c83
2019-10-04 09:35:53 -07:00
7fc06ea541 Bytecode export flow (#25187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25187

The bytecode export flow: dump the bytecode format for the light weighted interpreter.
* The bytecode is generated without input spec optimization. It would be more generic (input independent) with no obvious performance degradation (to be tested).
* Main API: torch::jit::script::Module::save(filename, extra_files, bool *bytecode_format* = false).
* Both bytecode and module object are exported in pickle format.
    * The module object (in data.pkl) is the same as the original JIT model.
    * The serializer is dependent on pickle only (no protobuf or Json).
    * The major functionality is forked in ScriptModuleSerializer2::serialize().
    * The test loader is test_bc_export.cpp.
* Simple APIs are added in Code and its implementation to get necessary information (instructions, operators and constants).
* Since there's no dependency on graph/node, GetAttr is promoted from an operator to first-class instruction (https://github.com/pytorch/pytorch/pull/25151) .
* Some definitions (instructions, writeArchive, etc) that are shared by full JIT and bytecode are pulled out of the local namespace (https://github.com/pytorch/pytorch/pull/25148).

The output layout looks like:

* folders of methods.
    * In each method folder (for example, forward/):
        * bytecode.pkl: instructions and operators
        * constants{.pkl,/}: constant list in constants.pkl. If there are tensors in constants, the binary tensor files in constants/ folder.
* data{.pkl,/}: the module object, with binary tensor files in data/ folder. The same as in torchscript.

Test Plan: Imported from OSS

Differential Revision: D17076411

fbshipit-source-id: 46eb298e7320d1e585b0101effc0fcfd09219046
2019-09-25 16:35:45 -07:00