This PR replace c10::guts::to_string with std::to_string. The major part of changes is using void* as optimizer state key since string is used only for serialization and using pointers as hashing keys is more efficient than a string.
Some other guts functions in the affected source files are also replaced.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108480
Approved by: https://github.com/Skylion007
Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236
Approved by: https://github.com/malfet
Summary: D38998858 (3fae89d4a4) used the wrong version of `_load_for_mobile` that kept the "load everything in memory then parse" technique. This fixes it to call the `_load_for_mobile_impl` version which for non-flatbuffer models will stream parse. See D38998858 (3fae89d4a4) for the expected memory optimization gains.
Test Plan: CI Signals.
Reviewed By: qihqi
Differential Revision: D39138280
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84296
Approved by: https://github.com/qihqi
Summary:
Remove code dup in import.cpp / export_modules.cpp such that
1. Only one copy of switching logic (detect flatbuffer / is_flatbuffer);
2. Move detection of includeness of flatbuffer to runtime (so no more macros)
This also reverts the dependency of import.cpp -> flatbuffer_loader.cpp to flatbuffer_loader.cpp -> import.cpp.
Differential Revision: D36926217
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79184
Approved by: https://github.com/zhxchen17
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76350
fix memory issue
Test Plan:
User reported ASAN errors when running `//xplat/langtech/mobile:giga5_bin`
Verified with the user that rebasing to this diff it is gone.
Reviewed By: pavithranrao
Differential Revision: D35911894
fbshipit-source-id: 41eb88fe1501d1bb7dd9ce3d36c224c3300a41e8
(cherry picked from commit 3a13aae20f698bc5bcc4d5cb686bb36d433a2f03)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57775
The minimum supported bytecode version is updated from 3 to 4. We no longer support version 3 bytecode models.
Why?
* There are hacky codes in operator loading, that performs differently on one operator on the global bytecode version 3. Instead operator related metadata should be passed (for example, in #56845). To allow future development, we remove the hacky way first.
* The bytecode version was bumped from 3 to 4 more than half a year ago. Since all the production models are all bumped to version 4, it's not practical to keep and maintain version 3. The risk to deprecate version 3 is low.
Test Plan: Imported from OSS
Reviewed By: raziel
Differential Revision: D28270791
Pulled By: cccclai
fbshipit-source-id: 70b1bd6352fdaae5f8d2173b81578d77018c8e44
(cherry picked from commit 3e930fa381cd01f3705116795c6426df992372fc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74594
Extending `_save_for_mobile` and `_load_for_mobile` to support faltbuffer format with additional optional argument which is set to pick pickle by default.
Adding new binary target with suffix `_pickle_and_flatbuffer` to help migration.
Size test in D34909502 shows the size has regressed by ~40K but after removing pickle and comparing lite_predictors we have ~120K size measure that we will achieve when deprecating pickle and moving to flatbuffer
**BEFORE:**
```lang=mermaid
graph TD;
torch_core-->torch_mobile_deserialize;
torch_mobile_core-->torch_mobile_deserialize;
jit_module_saving-->torch_core;
jit_module_saving-->torch_mobile_core;
torch_mobile_deserialize-->caffe2_serialize;
torch_mobile_deserialize-->torch_mobile_module;
caffe2_serialize-->miniz;
flatbuffer_loader-->mobile_bytecode;
flatbuffer_serializer-->mobile_bytecode;
mobile_bytecode-->flatbuffer_2.0;
flatbuffer_loader-->torch_mobile_module;
flatbuffer_serializer-->torch_mobile_module;
```
**AFTER:**
```lang=mermaid
graph TD;
torch_core-->torch_mobile_deserialize;
torch_mobile_core-->torch_mobile_deserialize;
jit_module_saving-->torch_core;
jit_module_saving-->torch_mobile_core;
torch_mobile_deserialize-->caffe2_serialize;
torch_mobile_deserialize-->torch_mobile_module;
caffe2_serialize-->miniz;
flatbuffer_loader-->mobile_bytecode;
flatbuffer_serializer-->mobile_bytecode;
mobile_bytecode-->flatbuffer_2.0;
torch_mobile_deserialize_pickle_and_flatbuffer-->|new| flatbuffer_loader;
torch_mobile_deserialize_pickle_and_flatbuffer-->|new| torch_mobile_deserialize;
torch_mobile_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
torch_core_pickle_and_flatbuffer-->|new| torch_mobile_deserialize_pickle_and_flatbuffer;
jit_module_saving_pickle_and_flatbuffer-->|new| torch_core_pickle_and_flatbuffer;
jit_module_saving_pickle_and_flatbuffer-->|new| torch_mobile_core_pickle_and_flatbuffer;
flatbuffer_serializer-->torch_mobile_module;
jit_module_saving_pickle_and_flatbuffer-->|new|jit_module_saving;
jit_module_saving_pickle_and_flatbuffer-->|new|flatbuffer_serializer;
flatbuffer_loader-->torch_mobile_module;
```
Original commit changeset: 780dfb6fd6ba
Original Phabricator Diff: D34805092 (284b2b7135)
ghstack-source-id: 152044801
(Note: this ignores all push blocking failures!)
Test Plan:
CI
```
~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 //caffe2/test/cpp/jit:jit -- FlatbufferTest.ExtraFiles
Parsing buck files: finished in 0.9 sec
Building: finished in 5.3 sec (100%) 12992/54304 jobs, 0/54304 updated
Total time: 6.2 sec
More details at https://www.internalfb.com/intern/buck/build/2b387fff-f813-4cfa-b53f-eb2378630d4e
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d
Trace available for this run at /tmp/tpx-20220323-134108.766518-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d/trace.log
RemoteExecution session id: reSessionID-f93a84d6-e7ce-41a0-a97f-0ef3fa6d199d-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
✓ ListingSuccess: caffe2/test/cpp/jit:jit : 486 tests discovered (19.122)
✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.ExtraFiles (0.187)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599723101693
```
Similar Build Deps Dags
```
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops_pickle_and_flatbuffer, //xplat/caffe2:torch_mobile_deserialize_pickle_and_flatbuffer)' --output-format dot-compact | pastry
P486770901: https://www.internalfb.com/intern/paste/P486770901/
[pavithran@devvm5216.vll0 /data/users/pavithran/fbsource] buck query 'allpaths(//xplat/caffe2:torch_mobile_all_ops, //xplat/caffe2:torch_mobile_deserialize)' --output-format dot-compact | pastry
P486771278: https://www.internalfb.com/intern/paste/P486771278/
```
pickle_and_flatbuffer: https://www.internalfb.com/intern/dgw/graph/?build_id=P486770901
pickle: https://www.internalfb.com/intern/dgw/graph/?build_id=P486771278
Reviewed By: iseeyuan
Differential Revision: D35067157
fbshipit-source-id: 9044259c17a2e0da79bd6aedb28efbdfd57e23e0
(cherry picked from commit f738069ec3a72e79da56172741d027de514e9e5f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72523
`toTuple()` returns a new intrusive pointer that bumps its underlying ref count. Whereas, `toTupeRef` returns a reference. We can save an unnecessary ref count bump.
ghstack-source-id: 149173308
Test Plan: Sandcastle CI
Reviewed By: swolchok
Differential Revision: D34047666
fbshipit-source-id: 8c821e45f7af4f3f1d098871926b9df288e329fb
(cherry picked from commit 34797e508d533c578a40f74ffc82b34e1c3ea40e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70338
Today Unpickler is used by both server and mobile for deserializing model, and it always fallback to mobile parser when there's no type resolver provided by user. However this is not intended as server and mobile type parser supports different things. In this diff we provide a default fallback using script parser and opt it out for all mobile cases.
ghstack-source-id: 146727330
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: iseeyuan
Differential Revision: D33284352
fbshipit-source-id: 997c4f110b36eee6596e8f23f6a87bf91a4197ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70225
Thanks for zhxchen17's suggestion. This pr move the operator initialization logic to `upgrader_mobile.cpp`, such that we can leverage the static variable to ensure the operator initialization only happens once.
ghstack-source-id: 146103229
Test Plan:
```
buck test mode/opt //papaya/integration/service/test/analytics/histogram:generic_histogram_system_test -- --exact 'papaya/integration/service/test/analytics/histogram:generic_histogram_system_test - SumHistogramSystemTest.test' --run-disabled
buck test mode/opt //caffe2/test/cpp/jit:jit
buck test mode/dev //papaya/integration/service/test/mnist:mnist_system_test -- --exact 'papaya/integration/service/test/mnist:mnist_system_test - MnistFederatedSystemTest.test'
```
Reviewed By: zhxchen17
Differential Revision: D33247543
fbshipit-source-id: 6c3a87fe909a1be01452fa79649065845b26d805
Summary:
Upgrader should only be initialized once when runtime loads the first module. It no longer needs to initialized afterwards.
Previously, instead of using an atomic variable, the upgrader will be initialized depends on whether byteCodeFunctionWithOperator.function.get_code().operators_ is empty. If it's empty, it means the operator from the upgrader is not initialized yet. However, it's not thread safe. When multiple thread loads module together, it's possible that they all consider it's the first module. Use an atomic variable here to make sure it's thread safe.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70161
ghstack-source-id: 146012642
Test Plan:
```
buck test mode/opt //papaya/integration/service/test/analytics/histogram:generic_histogram_system_test -- --exact 'papaya/integration/service/test/analytics/histogram:generic_histogram_system_test - SumHistogramSystemTest.test' --run-disabled
buck test mode/opt //caffe2/test/cpp/jit:jit
```
Reviewed By: iseeyuan
Differential Revision: D33220320
fbshipit-source-id: 10f2397c3b358d5a1d39a2ce25457e3fdb640d2c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037
Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android.
ghstack-source-id: 145818696
Test Plan: eyes.
Reviewed By: qihqi, tugsbayasgalan
Differential Revision: D32264616
fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67729
1. operator version is needed to decide whether applying upgrader or not. This pr make it available at loading stage.
2. Swap the order of parsing instruction and operator, because instruction needs to know the operator first because deciding whether applying upgrader or not (change `OP` to `CALL` or not).
ghstack-source-id: 145082390
Test Plan:
```
buck test //caffe2/test/cpp/jit:jit
```
Reviewed By: iseeyuan
Differential Revision: D32092516
fbshipit-source-id: 853a68effaf95dca86ae46b7f7f4ee0d8e8767da
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65971
ghstack-source-id: 141842335
We should be able to load methods into their ClassTypes. Right now mobile runtime only loads data member to ClassTypes but not for methods. To support interface call, we inject methods into ClassTypes when the methods are loaded.
Test Plan: existing tests should all pass.
Reviewed By: qihqi
Differential Revision: D31326146
fbshipit-source-id: fb1dbea619910ef1f8fa26146da3ebab348fe902
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066
I noticed a bunch of time being spent heap-allocating Tuples
in the unpickler. 1-, 2-, and 3-element Tuples are apparently common
enough that they get their own bytecode instructions, so I decided to
try also giving them their own representation. We store up to 3
IValues inline in `Tuple` rather than doing a second heap allocation
for a `std::vector<IValue>`.
ghstack-source-id: 140695395
Test Plan:
Added automated tests for TupleElements.
Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422
We went from 347 ms to 302 ms.
Reviewed By: dhruvbird
Differential Revision: D30592622
fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65865
`operator_str` is not used in `import.cpp` and it is also defined in `parse_operators.cpp` so removing it from `import.cpp`.
Test Plan: CI passing
Reviewed By: iseeyuan
Differential Revision: D31293008
fbshipit-source-id: 1c857cbd63c57b8f79c1a068789fc8605605b642
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64065
It is only safe to mutate Tuple elements if you are the sole owner
of the tuple. The most efficient way to do this, then, is
`std::move(*std::move(tupleIValue).toTuple()).elements()` (the
innermost move allows `IValue::toTuple()` to avoid a refcount bump and
the outermost move allows the element vector to be moved out of the
tuple), but many callsites write simply
`tupleIValue.toTuple().elements()`, which incurs many extra refcount
bumps.
ghstack-source-id: 139468088
Test Plan: CI
Reviewed By: ezyang
Differential Revision: D30592621
fbshipit-source-id: e8312de866de09b9ea2a62e5128cbf403ee16f09
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65710
No need to incur extra refcount bumps, and no need to use a stringstream for what are presumably string keys anyway.
ghstack-source-id: 139325445
Test Plan: CI, reviewers to confirm the keys are supposed to be strings
Reviewed By: dhruvbird
Differential Revision: D31215347
fbshipit-source-id: 82be93cb2e57aefe94edf74d149115cb734112be
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179
This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build.
Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`.
Reviewed By: iseeyuan
Differential Revision: D31006555
fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64269
Revert changes in D29826210 (693d8f2f07) (we don't need operator lambda caching since there aren't duplicate operators anymore)
This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided.
ghstack-source-id: 138014904
Test Plan:
**Speech Transducer v25 model (as in D29826210 (693d8f2f07))**
|| Before | After |
|Load Time|[229ms](https://www.internalfb.com/intern/aibench/details/160889436133243)|[200ms](https://www.internalfb.com/intern/aibench/details/837884532607514)|
|Save File Size|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658544950)|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658554403)|
The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before.
Steps
- Check out desired commit in devserver (base branch or this diff)
- ```buck build bento/kernels:bento_kernel_pytorch```
- Use N1094068 with pytorch_local kernel to save model for lite interpreter
- Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5
- ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ```
**Test that saving a model with de-dup ops doesn't change its output**
https://www.internalfb.com/intern/anp/view/?id=1137434
Reviewed By: iseeyuan
Differential Revision: D30615710
fbshipit-source-id: bb4052f0f16eccab386585e94411056f94bce43c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862
Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.
Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287
Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI
Reviewed By: larryliu0820
Differential Revision: D29798382
Pulled By: iseeyuan
fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64029
This should save us a separate pass over the data structure to destroy it.
ghstack-source-id: 137566821
Test Plan:
Pixel3
before:
https://www.internalfb.com/intern/aibench/details/503337445067962
after:
https://our.intern.facebook.com/intern/aibench/details/320277034999340
overall mean time decreased from 373 ms to 358 ms. In flame graph, we
can see that some time spent destroying a vector of IValues was moved
into parseMethods, and the new parseMethods time is less than the old
time plus the recursive destruction time.
Reviewed By: dhruvbird
Differential Revision: D30559530
fbshipit-source-id: d080295a846745ea03ac50f08f4f6c95f4eaf3d8