Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75274
- default to generating forced fallback for TS backend (where it is used
for tests/debugging, but false otherwise
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D35411211
Pulled By: wconstab
fbshipit-source-id: ccff2f65aa5d8e1aa670d210ce51805985df55ce
(cherry picked from commit 55b48cc02497686f4e25ed3c6dcf9b6b77d49136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75267
- clean up arguments relating to ts backend generation
- make entire lowering function rather than just body be a part of
backend-IR class
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D35411212
Pulled By: wconstab
fbshipit-source-id: 44419e42f706afeb967f704649c2b44e9f66d969
(cherry picked from commit 80a6fa715db97deb056db31e28689dd86a50a4bb)
Summary:
Previously, the torchscript backend would be (partially) initialized at startup.
- the dispatcher registrations would be registered,
- but other backend components would not be initialized until explicitly calling
the backend init function
With this change, the torchscript backend is not initialized until its explicit
initialization function is called.
This enables external backends to register their own backend instead of the torchscript
backend to the same (Lazy) key.
Lands a change contributed by antoniojkim via lazy_tensor_staging branch (https://github.com/pytorch/pytorch/issues/73973)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74557
Reviewed By: bdhirsh
Differential Revision: D35051464
Pulled By: wconstab
fbshipit-source-id: 5a8b0851293e394f49427d1416ee571a8881fe9f
(cherry picked from commit ef745a4a2c8d1d7f9510541a20f1f40625ce29de)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74563
This is used inconsistently in all the generate_code program
invocations. Nevertheless, nothing consumes this flag, so we can
safely remove it.
This was removed in #25353.
ghstack-source-id: 152249818
Test Plan: Should be a no-op, rely on CI.
Reviewed By: malfet
Differential Revision: D35053096
fbshipit-source-id: 3ad19e83ca14649b514dc163c3caff6cbd118e14
(cherry picked from commit a43f05bb43553249caac3c3479986cbc45d286ae)
Summary:
Also enables bazel build to run lazy codegen. Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111
Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds
Reviewed By: bdhirsh
Differential Revision: D34772403
fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496
(cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74450
- per-operator-headers is a strict build mode where compulation units aren't allowed
to depend on bulk headers like ATen/Functions.h, but must instead depend only on the
specific operator headers used. (In other configurations, the reverse is required).
Test Plan: CI to make sure nothing breaks for existing backends, and rebased next diff manual test to make sure it actually helps
Reviewed By: ezyang, bdhirsh
Differential Revision: D35002666
fbshipit-source-id: 712445f8d146cf026759444fbd42a20705be9bef
(cherry picked from commit f13e5522d49a6edcb6aed4431b1ec8e2b50a98fc)
Summary:
Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel.
Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later)
Bazel support is added in a later diff.
Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996
Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h
Reviewed By: ezyang
Differential Revision: D34408536
fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515
(cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67496
gen_autograd.py doesn't use `Declarations.yaml` any more, and removing
the dependency allows it to run in parallel with
`tools/codegen/gen.py`.
Test Plan: Imported from OSS
Reviewed By: dagitses, ejguan
Differential Revision: D32027251
Pulled By: albanD
fbshipit-source-id: 2cc0bbe36478e6ec497f77a56ab8d01c76145703
Summary:
This PR greatly simplifies `mypy-strict.ini` by strictly typing everything in `.github` and `tools`, rather than picking and choosing only specific files in those two dirs. It also removes `warn_unused_ignores` from `mypy-strict.ini`, for reasons described in https://github.com/pytorch/pytorch/pull/56402#issuecomment-822743795: basically, that setting makes life more difficult depending on what libraries you have installed locally vs in CI (e.g. `ruamel`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59117
Test Plan:
```
flake8
mypy --config mypy-strict.ini
```
Reviewed By: malfet
Differential Revision: D28765386
Pulled By: samestep
fbshipit-source-id: 3e744e301c7a464f8a2a2428fcdbad534e231f2e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50611
Removed the unused old-style code to prevent it from being used.
Added all autograd/gen_pyi sources to mypy-strict.ini config.
Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
.jenkins/pytorch/codegen-test.sh <baseline_output_dir>
.jenkins/pytorch/codegen-test.sh <test_output_dir>
Then run diff to compare the generated files:
diff -Naur <baseline_output_dir> <test_output_dir>
```
Confirmed clean mypy-strict run:
```
mypy --config mypy-strict.ini
```
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D25929730
Pulled By: ljk53
fbshipit-source-id: 1fc94436fd4a6b9b368ee0736e99bfb3c01d38ef
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49251
Since all ops are c10-full and use templated unboxing now, we don't need to codegenerate any unboxing logic anymore.
Since this codegen was the only code using setManuallyBoxedKernel, we can also remove that functionality from KernelFunction, OperatorEntry and Dispatcher.
ghstack-source-id: 119450486
Test Plan: waitforsandcastle
Reviewed By: ezyang
Differential Revision: D25502865
fbshipit-source-id: 49d009df159fda4be41bd02457d4427e6e638c10
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47745
This is a relatively small codegen. Reintroduced 'simple_type' to preserve
old codegen output.
It depends on some methods defined in gen_python_functions.py - next PR will
clean up the remaining Declarations.yaml methods in gen_python_functions.py.
Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
.jenkins/pytorch/codegen-test.sh <baseline_output_dir>
.jenkins/pytorch/codegen-test.sh <test_output_dir>
Then run diff to compare the generated files:
diff -Naur <baseline_output_dir> <test_output_dir>
```
Differential Revision: D24885068
Test Plan: Imported from OSS
Reviewed By: ezyang
Pulled By: ljk53
fbshipit-source-id: c0fbd726bcc450c3c7fe232c23e5b31779d0b65f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46244
- What does the generated binding code do?
The Python binding codegen produces code that takes the input list of
PyObjects, finds the matching ATen C++ function using PythonArgParser,
converts the PyObjects into C++ types and calls the ATen C++ function:
```
+--------+ parsing +------------------------+ binding +-----------------------+
| PyObjs | ---------> | PythonArgParser Output | ---------> | Cpp Function Dispatch |
+--------+ +------------------------+ +-----------------------+
```
- Are Python arguments 1-1 mapped to C++ arguments?
Python arguments might be reordered, packed, unpacked when binding to
C++ arguments, as illustrated below:
```
// Binding - Reorder & Packing
// aten::empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None,
Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
Python Args Cpp Args
-----------------------------------------------------------
0: size size
1: names names
2: memory_format -------+
3: dtype -----+-|--> options
4: layout / |
5: device / +--> memory_format
6: pin_memory /
7: requires_grad -+
// Binding - Unpacking
// aten::max.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)
Python Args Cpp Args
-----------------------------------------------------------
+----> max
/-----> max_values
0: input / self
1: dim / dim
2: keepdim / keepdim
3: out -----+
```
- Why do we want to rewrite the python binding codegen?
The old codegen takes Declarations.yaml as input. It doesn't distinguish
between Python arguments and C++ arguments - they are all mixed together
as a bag of non-typed dict objects. Different methods process these arg
objects and add new attributes for various different purposes. It's not so
obvious to figure out the semantics of these attributes. The complicated
binding logic happens implicitly and scatteredly.
```
+--------------------+
| Native Functions |
+--------------------+
|
|
v
+--------------------+
| Cpp Signatures |
+--------------------+
|
|
v
+--------------------+
| Declarations.yaml |
+--------------------+
| +-------------------------------------+
| +-------> | PythonArgParser Schema |
| | +-------------------------------------+
| | .
| | .
v | .
+--------------------+ +-------------------------------------+
| NonTyped Args Objs | --> | PythonArgParser -> Cpp Args Binding |
+--------------------+ +-------------------------------------+
| .
| .
| .
| +-------------------------------------+
+-------> | Cpp Function Dispatch |
+-------------------------------------+
```
This PR leverages the new immutable data models introduced in the new
aten codegen. It introduces dedicated data models for python schema.
This way, we can not only avoid subtle Declaration.yaml conversions but
also decouple the generation of python schema, python to c++ binding and
c++ function call.
The ultimate state will be like the following diagram:
```
+-------------------+ +-------------------------------------+
+-------> | Python Signatures | --> | PythonArgParser Schema |
| +-------------------+ +-------------------------------------+
| | .
| | .
| | .
+------------------+ | +-------------------------------------+
| Native Functions | +-------> | PythonArgParser -> Cpp Args Binding |
+------------------+ | +-------------------------------------+
| | .
| | .
| | .
| +-------------------+ +-------------------------------------+
+-------> | Cpp Signatures | --> | Cpp Function Dispatch |
+-------------------+ +-------------------------------------+
```
This PR has migrated the core binding logic from
tools/autograd/gen_python_functions.py to tools/codegen/api/python.py.
It produces the byte-for-byte same results (tested with #46243).
Will migrate the rest of gen_python_functions.py in subsequent PRs.
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D24388874
Pulled By: ljk53
fbshipit-source-id: f88b6df4e917cf90d868a2bbae2d5ffb680d1841
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45722
This diff does a bunch of things:
1. Introduces some abstractions as detailed in https://fb.quip.com/2oEzAR5MKqbD to help with selective build related codegen in multiple files.
2. Adds helper methods to combine operators, debug info, operator lists, etc...
3. Currently, the selective build machinery querying `op_registration_whitelist` directly at various places in the code. `op_registration_whitelist` is a list of allowed operator names (without overload name). We want to move to a world where the overload names are also included so that we can be more selective about which operators we include. To that effect, it makes sense to hide the checking logic in a separate abstraction and have the build use that abstraction instead of putting all this selective build specific logic in the code-generator itself. This change is attempting to do just that.
4. Updates generate_code, unboxing-wrapper codegen, and autograd codegen to accept the operator selector paradigm as opposed to a selected operator list.
5. Update `tools/code_analyzer/gen_op_registration_allowlist.py` to expose providing an actual structured operator dependency graph in addition to a serialized string.
There are a bunch of structural changes as well:
1. `root_op_list.yaml` and `combined_op_list.yaml` are now actual YAML files (not a space separated list of operator names)
2. `generate_code.py` accepts only paths to operator list YAML files (both old style as well as new style) and not list of operator names on the command line as arguments
3. `gen.py` optionally also accepts a custom build related operators YAML path (this file has information about which operators to register in the generated library).
ghstack-source-id: 114578753
(Note: this ignores all push blocking failures!)
Test Plan:
`buck test caffe2/test:selective_build`
Generated YAML files after the change:
{P143981979}
{P143982025}
{P143982056}
Ensure that the generated files are same before and after the change:
```
[dhruvbird@devvm2490 /tmp/TypeDefault.cpp] find -name "*.cpp" | xargs md5sum
d72c3d125baa7b77e4c5581bbc7110d2 ./after_change/gen_aten/TypeDefault.cpp
42353036c83ebc7620a7159235b9647f ./after_change/lite_predictor_lib_aten/TypeDefault.cpp
d72c3d125baa7b77e4c5581bbc7110d2 ./before_change/gen_aten/TypeDefault.cpp
42353036c83ebc7620a7159235b9647f ./before_change/lite_predictor_lib_aten/TypeDefault.cpp
```
`VariableTypes_N.cpp` are generated the same both before and after the change:
```
[dhruvbird@devvm2490 /tmp/VariableType] find -name "*.cpp" | xargs -n 1 md5sum | sort
3be89f63fd098291f01935077a60b677 ./after/VariableType_2.cpp
3be89f63fd098291f01935077a60b677 ./before/VariableType_2.cpp
40a3e59d64e9dbe86024cf314f127fd6 ./after/VariableType_4.cpp
40a3e59d64e9dbe86024cf314f127fd6 ./before/VariableType_4.cpp
a4911699ceda3c3a430f08c64e8243fd ./after/VariableType_1.cpp
a4911699ceda3c3a430f08c64e8243fd ./before/VariableType_1.cpp
ca9aa611fcb2a573a8cba4e269468c99 ./after/VariableType_0.cpp
ca9aa611fcb2a573a8cba4e269468c99 ./before/VariableType_0.cpp
e18f639ed23d802dc4a31cdba40df570 ./after/VariableType_3.cpp
e18f639ed23d802dc4a31cdba40df570 ./before/VariableType_3.cpp
```
Reviewed By: ljk53
Differential Revision: D23837010
fbshipit-source-id: ad06b1756af5be25baa39fd801dfdf09bc565442
Summary:
For mobile custom build, we only generate code for ops that are used by
specific models to reduce binary size.
There multiple places where we apply the op filtering:
- generated_unboxing_wrappers_*.cpp
- autograd/VariableType*.cpp
- c10 op registration (in aten/gen.py)
For c10 op registration, we filter by the main op name - all overloads
that match the main op name part will be kept.
For generated_unboxing_wrappers_*, we filter by the full op name - only
those having exactly the same overload name will be kept.
This PR changes generated_unboxing_wrappers_* and autograd/VariableType*.cpp
codegen to also filter by the main op name.
The reasons are:
- keeping all overloads can have better backward compatibility;
- generated_unboxing_wrappers_* are relatively small as it only contains
thin wrappers for root ops.
- generated_unboxing_wrappers_* will be replaced by c10 op registration
soon anyway.
- autograd/VariableType*.cpp are not included in OSS build.
Why it offers better backward compatibility? #40737 is an example:
It introduced a new `_convolution` overload and renamed the original one
to `_convolution.deprecated`. Before this PR, the model prepared by the
old version PyTorch won't be able to run on the custom mobile build
generated on the PR because `_convolution.deprecated` won't be kept in
the custom build due to full op name matching policy. By relaxing it to
partial matching policy, the mobile custom build CI on the PR can pass.
Will test the size impact for FB production build before landing.
Differential Revision: D22809564
Test Plan: Imported from OSS
Reviewed By: iseeyuan
Pulled By: ljk53
fbshipit-source-id: e2fc017da31f38b9430cc2113f33e6d21a0eaf0b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41575
Fixes https://github.com/pytorch/pytorch/issues/34294
This updates the C++ argument parser to correctly handle `TensorList` operands. I've also included a number of updates to the testing infrastructure, this is because we're now doing a much more careful job of testing the signatures of aten kernels, using the type information about the arguments as read in from `Declarations.yaml`. The changes to the tests are required because we're now only checking for `__torch_function__` attributes on `Tensor`, `Optional[Tensor]` and elements of `TensorList` operands, whereas before we were checking for `__torch_function__` on all operands, so the relatively simplistic approach the tests were using before -- assuming all positional arguments might be tensors -- doesn't work anymore. I now think that checking for `__torch_function__` on all operands was a mistake in the original design.
The updates to the signatures of the `lambda` functions are to handle this new, more stringent checking of signatures.
I also added override support for `torch.nn.functional.threshold` `torch.nn.functional.layer_norm`, which did not yet have python-level support.
Benchmarks are still WIP.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34725
Reviewed By: mruberry
Differential Revision: D22357738
Pulled By: ezyang
fbshipit-source-id: 0e7f4a58517867b2e3f193a0a8390e2ed294e1f3
Summary:
Quick fix due to code merging. With this feature working, the total size reduction in Android is 664 KB (Pytorch -26 KB and papaya - 639 KB)
https://fburl.com/unigraph/c726gvb1
Test Plan: CI
Reviewed By: kwanmacher
Differential Revision: D22053779
fbshipit-source-id: 8da4a651432b453c25e543bc64dbed02946de63d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39452
Selective build works on training.
* VariableType_?.cpp are now selectively generated based on the operator list.
* Add a flag in pt_operator_library, "train". If it's True, an extra flag of "pt_train_operator_library" will be added to the labels. A query for "pt_train_operator_library" will be done to aggregate the training operators. With this flag we limit the generated VariableType to used training operators only, to conserve the code size. The models for inference only have train = False by default.
* For testing purpose, caffe2/fb/pytorch_trainer is created. It's based on full jit but the operators are selectively built.
* smartkeyboard_debug_model is used for test. Since the static code analysis is not applied for VariableType yet, the operators are manually added based on debugging error messages.
* At build stage, make selective build optional for training code-gen library.
The reason is that to make fb4a built, the generated VariableType.cpp needs to depend on torch_mobile_train. Torch_mobile_train is not needed for apps with inference only. In those cases training can be turned off to remove the dependency on torch_mobile_train to save size. It can also be used as a switch to check size regression introduced by training.
ghstack-source-id: 105190037
(Note: this ignores all push blocking failures!)
Test Plan:
Training:
```
buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/pytorch_trainer:trainer ~/models/papaya/keyboard/smartkeyboard_debug_model.pt
```
Inference, with and without the new query-based feature:
```
buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4
```
```
buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4
```
Reviewed By: ljk53
Differential Revision: D21459302
fbshipit-source-id: df71a46d74f8c7448cbf51990804104f1384594f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36838
All ops now do unboxing after dispatch, i.e. c10 handles unboxing and c10 registers a wrapper for the op to JIT
The last op that manually registered its own wrapper to JIT in register_aten_ops.cpp was migrated.
Since there are no ops using use_c10_dispatcher: unboxed_only anymore, we can delete the feature.
Also:
- Rename some files to more accurately describe what they're doing now:
- OpsAlreadyMovedToC10.h/cpp -> ATenOpList.h/cpp
- register_aten_ops.cpp -> generated_unboxing_wrappers.cpp
- gen_jit_dispatch.py -> gen_unboxing_wrappers.cpp
ghstack-source-id: 102532915
Test Plan: waitforsandcastle
Differential Revision: D21100081
fbshipit-source-id: be824958eef33f6cd42a6a652175bd0b1df4ebf9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35426
Use selective build with the full set of operators (vs. manually register each used op with "_" prefix).
Lite interpreter relies on JIT operator dispatch. In future we still need JIT operator dispatch dispatch ops that are not registered in c10.
Currently the selective build is for c10/aten dispatch in BUCK. There is JIT selective code-gen in OSS but not ported to BUCK yet.
This diff is also porting the selective code-gen in BUCK.
* The selected op list is passed to gen_jit_dispatch.py.
* The list passed to gen_jit_dispatch is the top-level ops (USED_PT_OPS) only, because the selective c10/aten dispatch already registered other ops that are called from the top-level ops.
ghstack-source-id: 101885215
(Note: this ignores all push blocking failures!)
Test Plan:
1. In Python, run torch.jit.export_opnames(scripted_M_mod)
2. Append the operator names into fbcode/caffe2/pt_ops.bzl and the BUCK target.
3. Run
```
buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/temp/bi_pytext_0315.bc --input_dims "1,4" --input_type int64 --pytext_len=4
```
Should provide expected results.
In addition, the size of the generated code for JIT registration, for example, ```register_aten_ops_0.cpp```, should be significantly reduced (from ~250 KB to ~80KB). The non-selected op registration schema are still kept, but the registration functor is replaced by ```DUMMY_OPERATION```
Reviewed By: ljk53
Differential Revision: D20408831
fbshipit-source-id: ec75dd762c4613aeda3b2094f5dad11804dc9492
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36007
Tracing is not needed in Pytorch Mobile client. Disabling it has a couple of benefits:
1. It's a pre-requisite to build lite interpreter.
2. It saves the code size for full jit and Federated learning (around 600k).
Solution: use PYTORCH_DISABLE_TRACING to disable it. It's better than passing an argument to code-gen because:
1. It's a single-point change in the code template for both VariableType and VariableFactories.
2. code-gen does not handle VariableTypeManual.cpp. The macro is need there anyway.
ghstack-source-id: 101529401
Test Plan: CI
Reviewed By: ljk53
Differential Revision: D20852558
fbshipit-source-id: c28cec9f90208974acfa351ec9aec3fabbbb8aac
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35951
Change generate_code to keep folder structure the same regardless of whether install path is provide
Amend build_variables.bzl accordingly
Another preliminary step to merge https://github.com/pytorch/pytorch/pull/35220
Test Plan: CI
Reviewed By: EscapeZero, seemethere
Differential Revision: D20839410
fbshipit-source-id: 02297560a7e48aa7c6271f7a8517fc4a1ab35271
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33715
Tracing codes depend on the full JIT, which is not available in lite interpreter. Use `-c pt.disable_gen_tracing=1` to turn off generating tracing part.
ghstack-source-id: 99252322
Test Plan:
```
buck build xplat/caffe2:torch -c pt.disable_gen_tracing=1
```
The tracing part of generated/VariableType_?.cpp will not be generated.
Reviewed By: smessmer
Differential Revision: D19684577
fbshipit-source-id: a1e5b80eca5e51c7bf72b5cc8f0e36c2135fabc2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30144
Create script to produce libtorch that only contains ops needed by specific
models. Developers can use this workflow to further optimize mobile build size.
Need keep a dummy stub for unused (stripped) ops because some JIT side
logic requires certain function schemas to be existed in the JIT op
registry.
Test Steps:
1. Build "dump_operator_names" binary and use it to dump root ops needed
by a specific model:
```
build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml
```
2. The MobileNetV2 model should use the following ops:
```
- aten::t
- aten::dropout
- aten::mean.dim
- aten::add.Tensor
- prim::ListConstruct
- aten::addmm
- aten::_convolution
- aten::batch_norm
- aten::hardtanh_
- aten::mm
```
NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm".
You need fix it manually for now.
3. Run custom build script locally (use Android as an example):
```
SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```
4. Checkout demo app that uses locally built library instead of
downloading from jcenter repo:
```
git clone --single-branch --branch custom_build git@github.com:ljk53/android-demo-app.git
```
5. Copy locally built libraries to demo app folder:
```
find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \;
```
6. Build demo app with locally built libtorch:
```
cd ${HOME}/src/android-demo-app/HelloWorldApp
./gradlew clean && ./gradlew assembleDebug
```
7. Install and run the demo app.
In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M.
Test Plan: Imported from OSS
Differential Revision: D18612127
Pulled By: ljk53
fbshipit-source-id: fa8d5e1d3259143c7346abd1c862773be8c7e29a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26851
Add codegen option to remove backward ops from jit-op-registry as they are not
likely to be used for inference only mobile build.
Measured ARM-v7 AAR build size change: 5,804,182 -> 5,331,219.
Test Plan: - build and integrate with demo app;
Differential Revision: D17587422
Pulled By: ljk53
fbshipit-source-id: 08c0fc7a710698a0d4baaf16bbb73cb812b1126a
Summary:
This is the first of a series of changes to reduce build size by cutting
autograd functions from mobile build.
When INTERN_DISABLE_AUTOGRAD is set:
* On CMake side we exclude Functions.h/cpp, VariableType*.h/cpp,
VariableTypeManual.cpp from the build process. Still keep variable_factories.h
as we rely on it to create variables instead of tensors.
* In source code we gate a couple autograd references (in autograd/variable.cpp)
with C10_MOBILE (technically we should use a dedicated c macro but its
maintenance cost is higher than cmake macro as we have several build systems
to change).
* Pass --disable-autograd flag to codegen script, which will stop generating
Functions/VariableType code. And for variable_factories.h it will stop
generating tracing code.
Edit: in this diff we will keep Functions.h/cpp to avoid changing source code.
Why we need this change if it's already not calling VariableType and autograd
stuff with USE_STATIC_DISPATCH=ON for mobile?
It's trying to reduce static library size for iOS build, for which it's
relatively harder to strip size with linker approach.
Why we need make involved change into codegen script?
There isn't a global config system in codegen - autograd/env.py provides similar
functionality but it says not adding anything there.
Test Plan:
- will check CI;
- test mobile build in sample app;
Differential Revision: D17202733
Pulled By: ljk53
fbshipit-source-id: 5701c6639b39ce58aba9bf5489a08d30d1dcd299
Summary:
* adds TORCH_API and AT_CUDA_API in places
* refactor code generation Python logic to separate
caffe2/torch outputs
* fix hip and asan
* remove profiler_cuda from hip
* fix gcc warnings for enums
* Fix PythonOp::Kind
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19554
Differential Revision: D15082727
Pulled By: kostmo
fbshipit-source-id: 83a8a99717f025ab44b29608848928d76b3147a4
Summary:
Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414
Differential Revision: D13863635
Pulled By: zdevito
fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828
Summary:
after an analogous breakup of VariableType.cpp, the generated
register_aten_ops.cpp is now the slowest-to-compile file in a typical
incremental rebuild by a wide margin. Therefore, give it the same
treatment - the generated code is split across several files to allow
parallel compilation.
Note that the existing code takes some care to arrange that overloads
of the same op name are given in a particular order. This diff
preserves that behavior, by treating all overloads of the same name as
a single indivisible unit, and sharding based on these groups rather
than on individual constructors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12615
Reviewed By: ezyang
Differential Revision: D10367363
Pulled By: anderspapitto
fbshipit-source-id: 07db5f9cb79748040909716349626412a13bc86e
Summary:
On my devgpu, this brings the time taken for `touch torch/csrc/jit/type.h && time python setup.py rebuild develop` (debug mode, multicore build) down from 75 seconds to 62 seconds. For the `ninja install` of libtorch portion, which this affects, the reduction is from 52 seconds to 35.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12493
Reviewed By: zdevito
Differential Revision: D10315988
Pulled By: anderspapitto
fbshipit-source-id: 316dc4ab81134aaa17a568cfc07408b7ced08c2e
Summary:
operator.cpp is not generated. removing the line prevents generate_code.py from always thinking it is out of date and running.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9339
Reviewed By: ezyang
Differential Revision: D8798689
Pulled By: zdevito
fbshipit-source-id: f25a2e215fec29aa51571e6a31771f0f91e7a213
Summary:
This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness.
Commit 1
=======
This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process.
* Switches schema over to parsed declarations, in the future this will allow something like:
```
registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) {
...
})
```
This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines).
The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that.
Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types.
This removes the other way we encoded schema, and makes it easier to see what schema are registered.
Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6
* Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself.
* Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change.
* Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier.
Commit 2
=======
This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed.
* Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info.
* Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes.
* Remove addInterpreterOpHandler in favor of global operator registry.
* Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered.
* Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op.
* Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors. In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first.
* remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication
* refactor stack manipulation functions into a separate header file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885
Reviewed By: jamesr66a
Differential Revision: D8751048
Pulled By: zdevito
fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557